Predict Bike Sharing Demand with AutoGluon Template¶

Project: Predict Bike Sharing Demand with AutoGluon¶

This notebook is a template with each step that you need to complete for the project.

Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.

Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.

File-> Export Notebook As... -> Export Notebook as HTML

There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.

Completing the code template and writeup template will cover all of the rubric points for this project.

The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.

Step 1: Create an account with Kaggle¶

Create Kaggle Account and download API key¶

Below is example of steps to get the API username and key. Each student will have their own username and key.

  1. Open account settings. kaggle1.png kaggle2.png
  2. Scroll down to API and click Create New API Token. kaggle3.png kaggle4.png
  3. Open up kaggle.json and use the username and key. kaggle5.png

Step 2: Download the Kaggle dataset using the kaggle python library¶

Open up Sagemaker Studio and use starter template¶

  1. Notebook should be using a ml.t3.medium instance (2 vCPU + 4 GiB)
  2. Notebook should be using kernal: Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)

Install packages¶

In [2]:
!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
# Without --no-cache-dir, smaller aws instances may have trouble installing
Requirement already satisfied: pip in /usr/local/lib/python3.7/site-packages (21.3.1)
Collecting pip
  Using cached pip-22.3-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 21.3.1
    Uninstalling pip-21.3.1:
      Successfully uninstalled pip-21.3.1
Successfully installed pip-22.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (59.4.0)
Collecting setuptools
  Using cached setuptools-65.5.0-py3-none-any.whl (1.2 MB)
Collecting wheel
  Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Installing collected packages: wheel, setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 59.4.0
    Uninstalling setuptools-59.4.0:
      Successfully uninstalled setuptools-59.4.0
Successfully installed setuptools-65.5.0 wheel-0.37.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting mxnet<2.0.0
  Using cached mxnet-1.9.1-py3-none-manylinux2014_x86_64.whl (49.1 MB)
Collecting bokeh==2.0.1
  Using cached bokeh-2.0.1-py3-none-any.whl
Requirement already satisfied: pillow>=4.0 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (8.4.0)
Requirement already satisfied: typing-extensions>=3.7.4 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (4.0.1)
Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (2.8.2)
Requirement already satisfied: tornado>=5 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (6.1)
Requirement already satisfied: packaging>=16.8 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (21.3)
Requirement already satisfied: Jinja2>=2.7 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (3.0.3)
Requirement already satisfied: PyYAML>=3.10 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (5.4.1)
Requirement already satisfied: numpy>=1.11.3 in /usr/local/lib/python3.7/site-packages (from bokeh==2.0.1) (1.19.1)
Requirement already satisfied: requests<3,>=2.20.0 in /usr/local/lib/python3.7/site-packages (from mxnet<2.0.0) (2.22.0)
Requirement already satisfied: graphviz<0.9.0,>=0.8.1 in /usr/local/lib/python3.7/site-packages (from mxnet<2.0.0) (0.8.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.7/site-packages (from Jinja2>=2.7->bokeh==2.0.1) (2.0.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.7/site-packages (from packaging>=16.8->bokeh==2.0.1) (3.0.6)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/site-packages (from python-dateutil>=2.1->bokeh==2.0.1) (1.16.0)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (1.25.11)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests<3,>=2.20.0->mxnet<2.0.0) (2021.10.8)
Installing collected packages: mxnet, bokeh
  Attempting uninstall: bokeh
    Found existing installation: bokeh 2.4.2
    Uninstalling bokeh-2.4.2:
      Successfully uninstalled bokeh-2.4.2
Successfully installed bokeh-2.0.1 mxnet-1.9.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
Collecting autogluon
  Downloading autogluon-0.5.2-py3-none-any.whl (9.6 kB)
Collecting autogluon.tabular[all]==0.5.2
  Downloading autogluon.tabular-0.5.2-py3-none-any.whl (274 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 274.2/274.2 kB 104.9 MB/s eta 0:00:00
Collecting autogluon.core[all]==0.5.2
  Downloading autogluon.core-0.5.2-py3-none-any.whl (210 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 210.4/210.4 kB 184.7 MB/s eta 0:00:00
Collecting autogluon.vision==0.5.2
  Downloading autogluon.vision-0.5.2-py3-none-any.whl (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.8/48.8 kB 156.2 MB/s eta 0:00:00
Collecting autogluon.multimodal==0.5.2
  Downloading autogluon.multimodal-0.5.2-py3-none-any.whl (149 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 149.4/149.4 kB 193.6 MB/s eta 0:00:00
Collecting autogluon.features==0.5.2
  Downloading autogluon.features-0.5.2-py3-none-any.whl (59 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59.4/59.4 kB 137.1 MB/s eta 0:00:00
Collecting autogluon.text==0.5.2
  Downloading autogluon.text-0.5.2-py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.9/61.9 kB 156.6 MB/s eta 0:00:00
Collecting autogluon.timeseries[all]==0.5.2
  Downloading autogluon.timeseries-0.5.2-py3-none-any.whl (65 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 65.4/65.4 kB 113.3 MB/s eta 0:00:00
Collecting distributed<=2021.11.2,>=2021.09.1
  Downloading distributed-2021.11.2-py3-none-any.whl (802 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 802.2/802.2 kB 180.0 MB/s eta 0:00:00
Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.5.2->autogluon) (3.5.0)
Requirement already satisfied: boto3 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.5.2->autogluon) (1.20.17)
Requirement already satisfied: tqdm>=4.38.0 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.5.2->autogluon) (4.39.0)
Collecting scipy<1.8.0,>=1.5.4
  Downloading scipy-1.7.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (38.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 38.1/38.1 MB 158.2 MB/s eta 0:00:0000:0100:01
Requirement already satisfied: pandas!=1.4.0,<1.5,>=1.2.5 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.5.2->autogluon) (1.3.4)
Collecting dask<=2021.11.2,>=2021.09.1
  Downloading dask-2021.11.2-py3-none-any.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 215.7 MB/s eta 0:00:00
Collecting autogluon.common==0.5.2
  Downloading autogluon.common-0.5.2-py3-none-any.whl (37 kB)
Requirement already satisfied: scikit-learn<1.1,>=1.0.0 in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.5.2->autogluon) (1.0.1)
Requirement already satisfied: requests in /usr/local/lib/python3.7/site-packages (from autogluon.core[all]==0.5.2->autogluon) (2.22.0)
Collecting numpy<1.23,>=1.21
  Downloading numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 155.3 MB/s eta 0:00:00a 0:00:01
Collecting hyperopt<0.2.8,>=0.2.7
  Downloading hyperopt-0.2.7-py2.py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 210.7 MB/s eta 0:00:00
Collecting ray[tune]<1.14,>=1.13
  Downloading ray-1.13.0-cp37-cp37m-manylinux2014_x86_64.whl (54.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.5/54.5 MB 149.8 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: psutil<6,>=5.7.3 in /usr/local/lib/python3.7/site-packages (from autogluon.features==0.5.2->autogluon) (5.8.0)
Collecting torch<1.13,>=1.9
  Downloading torch-1.12.1-cp37-cp37m-manylinux1_x86_64.whl (776.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.3/776.3 MB 163.8 MB/s eta 0:00:0000:0100:01
Collecting torchvision<0.14.0
  Downloading torchvision-0.13.1-cp37-cp37m-manylinux1_x86_64.whl (19.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 154.7 MB/s eta 0:00:00a 0:00:01
Collecting omegaconf<2.2.0,>=2.1.1
  Downloading omegaconf-2.1.2-py3-none-any.whl (74 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 74.7/74.7 kB 162.7 MB/s eta 0:00:00
Collecting Pillow<9.1.0,>=9.0.1
  Downloading Pillow-9.0.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.3/4.3 MB 154.8 MB/s eta 0:00:00
Collecting smart-open<5.3.0,>=5.2.1
  Downloading smart_open-5.2.1-py3-none-any.whl (58 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.6/58.6 kB 155.5 MB/s eta 0:00:00
Collecting nlpaug<=1.1.10,>=1.1.10
  Downloading nlpaug-1.1.10-py3-none-any.whl (410 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.8/410.8 kB 205.2 MB/s eta 0:00:00
Collecting torchmetrics<0.8.0,>=0.7.2
  Downloading torchmetrics-0.7.3-py3-none-any.whl (398 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 398.2/398.2 kB 204.1 MB/s eta 0:00:00
Collecting nptyping<1.5.0,>=1.4.4
  Downloading nptyping-1.4.4-py3-none-any.whl (31 kB)
Collecting scikit-image<0.20.0,>=0.19.1
  Downloading scikit_image-0.19.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.5/13.5 MB 142.7 MB/s eta 0:00:00a 0:00:01
Collecting timm<0.6.0
  Downloading timm-0.5.4-py3-none-any.whl (431 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 431.5/431.5 kB 196.8 MB/s eta 0:00:00
Collecting fairscale<=0.4.6,>=0.4.5
  Downloading fairscale-0.4.6.tar.gz (248 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 248.2/248.2 kB 196.4 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting nltk<4.0.0,>=3.4.5
  Downloading nltk-3.7-py3-none-any.whl (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 211.7 MB/s eta 0:00:00
Collecting pytorch-lightning<1.7.0,>=1.6.0
  Downloading pytorch_lightning-1.6.5-py3-none-any.whl (585 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 585.9/585.9 kB 199.9 MB/s eta 0:00:00
Collecting pytorch-metric-learning<1.4.0,>=1.3.0
  Downloading pytorch_metric_learning-1.3.2-py3-none-any.whl (109 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 109.4/109.4 kB 157.0 MB/s eta 0:00:00
Collecting torchtext<0.14.0
  Downloading torchtext-0.13.1-cp37-cp37m-manylinux1_x86_64.whl (1.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 214.0 MB/s eta 0:00:00
Collecting protobuf<=3.18.1
  Downloading protobuf-3.18.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 205.0 MB/s eta 0:00:00
Collecting sentencepiece<0.2.0,>=0.1.95
  Downloading sentencepiece-0.1.97-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 191.2 MB/s eta 0:00:00
Collecting transformers<4.21.0,>=4.18.0
  Downloading transformers-4.20.1-py3-none-any.whl (4.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 183.8 MB/s eta 0:00:00
Requirement already satisfied: networkx<3.0,>=2.3 in /usr/local/lib/python3.7/site-packages (from autogluon.tabular[all]==0.5.2->autogluon) (2.6.3)
Collecting xgboost<1.5,>=1.4
  Downloading xgboost-1.4.2-py3-none-manylinux2010_x86_64.whl (166.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.7/166.7 MB 170.5 MB/s eta 0:00:00a 0:00:01
Collecting lightgbm<3.4,>=3.3
  Downloading lightgbm-3.3.3-py3-none-manylinux1_x86_64.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 181.3 MB/s eta 0:00:00
Collecting catboost<1.1,>=1.0
  Downloading catboost-1.0.6-cp37-none-manylinux1_x86_64.whl (76.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.6/76.6 MB 159.9 MB/s eta 0:00:00a 0:00:01
Collecting fastai<2.8,>=2.3.1
  Downloading fastai-2.7.9-py3-none-any.whl (225 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.5/225.5 kB 195.5 MB/s eta 0:00:00
Collecting autogluon-contrib-nlp==0.0.1b20220208
  Downloading autogluon_contrib_nlp-0.0.1b20220208-py3-none-any.whl (157 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 157.3/157.3 kB 193.9 MB/s eta 0:00:00
Collecting gluonts<0.10.0,>=0.8.0
  Downloading gluonts-0.9.9-py3-none-any.whl (2.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.8/2.8 MB 189.1 MB/s eta 0:00:00
Collecting tbats~=1.1
  Downloading tbats-1.1.1-py3-none-any.whl (43 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43.8/43.8 kB 137.8 MB/s eta 0:00:00
Collecting pmdarima~=1.8.2
  Downloading pmdarima-1.8.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 210.5 MB/s eta 0:00:00
Collecting sktime~=0.11.4
  Downloading sktime-0.11.4-py3-none-any.whl (6.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.7/6.7 MB 184.5 MB/s eta 0:00:00
Collecting gluoncv<0.10.6,>=0.10.5
  Downloading gluoncv-0.10.5.post0-py2.py3-none-any.whl (1.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 209.9 MB/s eta 0:00:00
Collecting flake8
  Downloading flake8-5.0.4-py2.py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.9/61.9 kB 153.3 MB/s eta 0:00:00
Requirement already satisfied: pyarrow in /usr/local/lib/python3.7/site-packages (from autogluon-contrib-nlp==0.0.1b20220208->autogluon.text==0.5.2->autogluon) (6.0.1)
Collecting sacrebleu
  Downloading sacrebleu-2.3.1-py3-none-any.whl (118 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 118.9/118.9 kB 185.6 MB/s eta 0:00:00
Collecting sacremoses>=0.0.38
  Downloading sacremoses-0.0.53.tar.gz (880 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 880.6/880.6 kB 209.0 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting regex
  Downloading regex-2022.9.13-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (757 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 757.0/757.0 kB 213.1 MB/s eta 0:00:00
Collecting contextvars
  Downloading contextvars-2.4.tar.gz (9.6 kB)
  Preparing metadata (setup.py) ... done
Collecting sentencepiece<0.2.0,>=0.1.95
  Downloading sentencepiece-0.1.95-cp37-cp37m-manylinux2014_x86_64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 212.8 MB/s eta 0:00:00
Collecting yacs>=0.1.6
  Downloading yacs-0.1.8-py3-none-any.whl (14 kB)
Collecting tokenizers>=0.9.4
  Downloading tokenizers-0.13.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 154.6 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: six in /usr/local/lib/python3.7/site-packages (from catboost<1.1,>=1.0->autogluon.tabular[all]==0.5.2->autogluon) (1.16.0)
Requirement already satisfied: graphviz in /usr/local/lib/python3.7/site-packages (from catboost<1.1,>=1.0->autogluon.tabular[all]==0.5.2->autogluon) (0.8.4)
Requirement already satisfied: plotly in /usr/local/lib/python3.7/site-packages (from catboost<1.1,>=1.0->autogluon.tabular[all]==0.5.2->autogluon) (5.4.0)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/site-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (5.4.1)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/site-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (21.3)
Collecting toolz>=0.8.2
  Downloading toolz-0.12.0-py3-none-any.whl (55 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.8/55.8 kB 137.4 MB/s eta 0:00:00
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.7/site-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (2.0.0)
Collecting partd>=0.3.10
  Downloading partd-1.3.0-py3-none-any.whl (18 kB)
Requirement already satisfied: fsspec>=0.6.0 in /usr/local/lib/python3.7/site-packages (from dask<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (2021.11.1)
Collecting zict>=0.1.3
  Downloading zict-2.2.0-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: tornado>=5 in /usr/local/lib/python3.7/site-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (6.1)
Collecting click>=6.6
  Downloading click-8.1.3-py3-none-any.whl (96 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 kB 176.0 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/site-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (65.5.0)
Collecting sortedcontainers!=2.0.0,!=2.0.1
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting tblib>=1.6.0
  Downloading tblib-1.7.0-py2.py3-none-any.whl (12 kB)
Collecting msgpack>=0.6.0
  Downloading msgpack-1.0.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (299 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 299.8/299.8 kB 166.9 MB/s eta 0:00:00
Requirement already satisfied: jinja2 in /usr/local/lib/python3.7/site-packages (from distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (3.0.3)
Collecting fastdownload<2,>=0.0.5
  Downloading fastdownload-0.0.7-py3-none-any.whl (12 kB)
Requirement already satisfied: pip in /usr/local/lib/python3.7/site-packages (from fastai<2.8,>=2.3.1->autogluon.tabular[all]==0.5.2->autogluon) (22.3)
Collecting spacy<4
  Downloading spacy-3.4.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 168.5 MB/s eta 0:00:00a 0:00:01
Collecting fastprogress>=0.2.4
  Downloading fastprogress-1.0.3-py3-none-any.whl (12 kB)
Collecting fastcore<1.6,>=1.4.5
  Downloading fastcore-1.5.27-py3-none-any.whl (67 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 67.1/67.1 kB 165.8 MB/s eta 0:00:00
Requirement already satisfied: portalocker in /usr/local/lib/python3.7/site-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.5.2->autogluon) (2.3.2)
Collecting autocfg
  Downloading autocfg-0.0.8-py3-none-any.whl (13 kB)
Requirement already satisfied: opencv-python in /usr/local/lib/python3.7/site-packages (from gluoncv<0.10.6,>=0.10.5->autogluon.vision==0.5.2->autogluon) (4.5.4.60)
Collecting pydantic~=1.1
  Downloading pydantic-1.10.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 154.0 MB/s eta 0:00:00a 0:00:01
Collecting holidays>=0.9
  Downloading holidays-0.16-py3-none-any.whl (184 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 184.6/184.6 kB 197.7 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.7/site-packages (from gluonts<0.10.0,>=0.8.0->autogluon.timeseries[all]==0.5.2->autogluon) (4.0.1)
Collecting py4j
  Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 kB 198.8 MB/s eta 0:00:00
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 829.2/829.2 kB 213.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel in /usr/local/lib/python3.7/site-packages (from lightgbm<3.4,>=3.3->autogluon.tabular[all]==0.5.2->autogluon) (0.37.1)
Requirement already satisfied: setuptools-scm>=4 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.5.2->autogluon) (6.3.2)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.5.2->autogluon) (3.0.6)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.5.2->autogluon) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.5.2->autogluon) (1.3.2)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.5.2->autogluon) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.7/site-packages (from matplotlib->autogluon.core[all]==0.5.2->autogluon) (4.28.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/site-packages (from nltk<4.0.0,>=3.4.5->autogluon.multimodal==0.5.2->autogluon) (1.1.0)
Collecting typish>=1.7.0
  Downloading typish-1.9.3-py3-none-any.whl (45 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.1/45.1 kB 74.3 MB/s eta 0:00:00
Collecting antlr4-python3-runtime==4.8
  Downloading antlr4-python3-runtime-4.8.tar.gz (112 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 112.4/112.4 kB 146.8 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/site-packages (from pandas!=1.4.0,<1.5,>=1.2.5->autogluon.core[all]==0.5.2->autogluon) (2021.3)
Requirement already satisfied: Cython!=0.29.18,>=0.29 in /usr/local/lib/python3.7/site-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.5.2->autogluon) (0.29.24)
Collecting statsmodels!=0.12.0,>=0.11
  Downloading statsmodels-0.13.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 9.8/9.8 MB 171.1 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/site-packages (from pmdarima~=1.8.2->autogluon.timeseries[all]==0.5.2->autogluon) (1.25.11)
Collecting tqdm>=4.38.0
  Downloading tqdm-4.64.1-py2.py3-none-any.whl (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.5/78.5 kB 173.0 MB/s eta 0:00:00
Collecting pyDeprecate>=0.3.1
  Downloading pyDeprecate-0.3.2-py3-none-any.whl (10 kB)
Collecting tensorboard>=2.2.0
  Downloading tensorboard-2.10.1-py3-none-any.whl (5.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.9/5.9 MB 179.1 MB/s eta 0:00:00
Collecting virtualenv
  Downloading virtualenv-20.16.5-py3-none-any.whl (8.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 171.8 MB/s eta 0:00:00a 0:00:01
Collecting aiosignal
  Downloading aiosignal-1.2.0-py3-none-any.whl (8.2 kB)
Collecting filelock
  Downloading filelock-3.8.0-py3-none-any.whl (10 kB)
Collecting jsonschema
  Downloading jsonschema-4.16.0-py3-none-any.whl (83 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 83.1/83.1 kB 167.8 MB/s eta 0:00:00
Collecting frozenlist
  Downloading frozenlist-1.3.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (148 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 148.0/148.0 kB 192.4 MB/s eta 0:00:00
Collecting click>=6.6
  Downloading click-8.0.4-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.5/97.5 kB 174.3 MB/s eta 0:00:00
Collecting grpcio<=1.43.0,>=1.28.1
  Downloading grpcio-1.43.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 200.9 MB/s eta 0:00:00
Requirement already satisfied: attrs in /usr/local/lib/python3.7/site-packages (from ray[tune]<1.14,>=1.13->autogluon.core[all]==0.5.2->autogluon) (21.2.0)
Requirement already satisfied: tabulate in /usr/local/lib/python3.7/site-packages (from ray[tune]<1.14,>=1.13->autogluon.core[all]==0.5.2->autogluon) (0.8.9)
Collecting tensorboardX>=1.9
  Downloading tensorboardX-2.5.1-py2.py3-none-any.whl (125 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.4/125.4 kB 137.4 MB/s eta 0:00:00
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests->autogluon.core[all]==0.5.2->autogluon) (2.8)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/site-packages (from requests->autogluon.core[all]==0.5.2->autogluon) (2021.10.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests->autogluon.core[all]==0.5.2->autogluon) (3.0.4)
Collecting PyWavelets>=1.1.1
  Downloading PyWavelets-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 155.0 MB/s eta 0:00:00a 0:00:01
Collecting tifffile>=2019.7.26
  Downloading tifffile-2021.11.2-py3-none-any.whl (178 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 178.9/178.9 kB 181.1 MB/s eta 0:00:00
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.7/site-packages (from scikit-image<0.20.0,>=0.19.1->autogluon.multimodal==0.5.2->autogluon) (2.13.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/site-packages (from scikit-learn<1.1,>=1.0.0->autogluon.core[all]==0.5.2->autogluon) (3.0.0)
Requirement already satisfied: numba>=0.53 in /usr/local/lib/python3.7/site-packages (from sktime~=0.11.4->autogluon.timeseries[all]==0.5.2->autogluon) (0.53.1)
Collecting deprecated>=1.2.13
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.10.1-py3-none-any.whl (163 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 163.5/163.5 kB 186.8 MB/s eta 0:00:00
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/site-packages (from transformers<4.21.0,>=4.18.0->autogluon.multimodal==0.5.2->autogluon) (4.8.2)
Collecting tokenizers>=0.9.4
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.6/6.6 MB 161.4 MB/s eta 0:00:00a 0:00:01
Requirement already satisfied: botocore<1.24.0,>=1.23.17 in /usr/local/lib/python3.7/site-packages (from boto3->autogluon.core[all]==0.5.2->autogluon) (1.23.17)
Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /usr/local/lib/python3.7/site-packages (from boto3->autogluon.core[all]==0.5.2->autogluon) (0.10.0)
Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /usr/local/lib/python3.7/site-packages (from boto3->autogluon.core[all]==0.5.2->autogluon) (0.5.0)
Collecting wrapt<2,>=1.10
  Downloading wrapt-1.14.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (75 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.2/75.2 kB 165.5 MB/s eta 0:00:00
Collecting aiohttp
  Downloading aiohttp-3.8.3-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (948 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 948.0/948.0 kB 207.6 MB/s eta 0:00:00
Collecting korean-lunar-calendar
  Downloading korean_lunar_calendar-0.3.1-py3-none-any.whl (9.0 kB)
Collecting convertdate>=2.3.0
  Downloading convertdate-2.4.0-py3-none-any.whl (47 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.9/47.9 kB 143.8 MB/s eta 0:00:00
Collecting hijri-converter
  Downloading hijri_converter-2.2.4-py3-none-any.whl (14 kB)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in /usr/local/lib/python3.7/site-packages (from numba>=0.53->sktime~=0.11.4->autogluon.timeseries[all]==0.5.2->autogluon) (0.36.0)
Collecting locket
  Downloading locket-1.0.0-py2.py3-none-any.whl (4.4 kB)
Collecting typing-extensions~=4.0
  Downloading typing_extensions-4.4.0-py3-none-any.whl (26 kB)
Requirement already satisfied: tomli>=1.0.0 in /usr/local/lib/python3.7/site-packages (from setuptools-scm>=4->matplotlib->autogluon.core[all]==0.5.2->autogluon) (1.2.2)
Collecting spacy-loggers<2.0.0,>=1.0.0
  Downloading spacy_loggers-1.0.3-py3-none-any.whl (9.3 kB)
Collecting typing-extensions~=4.0
  Downloading typing_extensions-4.1.1-py3-none-any.whl (26 kB)
Collecting catalogue<2.1.0,>=2.0.6
  Downloading catalogue-2.0.8-py3-none-any.whl (17 kB)
Collecting preshed<3.1.0,>=3.0.2
  Downloading preshed-3.0.8-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.6/126.6 kB 186.6 MB/s eta 0:00:00
Collecting pathy>=0.3.5
  Downloading pathy-0.6.2-py3-none-any.whl (42 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.8/42.8 kB 131.2 MB/s eta 0:00:00
Collecting spacy-legacy<3.1.0,>=3.0.10
  Downloading spacy_legacy-3.0.10-py2.py3-none-any.whl (21 kB)
Collecting wasabi<1.1.0,>=0.9.1
  Downloading wasabi-0.10.1-py3-none-any.whl (26 kB)
Collecting thinc<8.2.0,>=8.1.0
  Downloading thinc-8.1.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (806 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 806.2/806.2 kB 210.8 MB/s eta 0:00:00
Collecting cymem<2.1.0,>=2.0.2
  Downloading cymem-2.0.7-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (36 kB)
Collecting langcodes<4.0.0,>=3.2.0
  Downloading langcodes-3.3.0-py3-none-any.whl (181 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 181.6/181.6 kB 197.0 MB/s eta 0:00:00
Collecting murmurhash<1.1.0,>=0.28.0
  Downloading murmurhash-1.0.9-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21 kB)
Collecting typer<0.5.0,>=0.3.0
  Downloading typer-0.4.2-py3-none-any.whl (27 kB)
Collecting srsly<3.0.0,>=2.4.3
  Downloading srsly-2.4.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (490 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 490.0/490.0 kB 209.0 MB/s eta 0:00:00
Collecting patsy>=0.5.2
  Downloading patsy-0.5.3-py2.py3-none-any.whl (233 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.8/233.8 kB 155.3 MB/s eta 0:00:00
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Downloading google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Collecting absl-py>=0.4
  Downloading absl_py-1.3.0-py3-none-any.whl (124 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.6/124.6 kB 182.1 MB/s eta 0:00:00
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Downloading tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/4.9 MB 202.7 MB/s eta 0:00:00
Collecting tensorboard-plugin-wit>=1.6.0
  Downloading tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 781.3/781.3 kB 212.8 MB/s eta 0:00:00
Collecting google-auth<3,>=1.6.3
  Downloading google_auth-2.13.0-py2.py3-none-any.whl (174 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 174.5/174.5 kB 183.1 MB/s eta 0:00:00
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.7/site-packages (from tensorboard>=2.2.0->pytorch-lightning<1.7.0,>=1.6.0->autogluon.multimodal==0.5.2->autogluon) (2.0.2)
Collecting markdown>=2.6.8
  Downloading Markdown-3.4.1-py3-none-any.whl (93 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.3/93.3 kB 162.7 MB/s eta 0:00:00
Collecting heapdict
  Downloading HeapDict-1.0.1-py3-none-any.whl (3.9 kB)
Collecting immutables>=0.9
  Downloading immutables-0.19-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (117 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.0/117.0 kB 177.7 MB/s eta 0:00:00
Collecting pyflakes<2.6.0,>=2.5.0
  Downloading pyflakes-2.5.0-py2.py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.1/66.1 kB 158.6 MB/s eta 0:00:00
Collecting pycodestyle<2.10.0,>=2.9.0
  Downloading pycodestyle-2.9.1-py2.py3-none-any.whl (41 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.5/41.5 kB 120.1 MB/s eta 0:00:00
Collecting importlib-metadata
  Downloading importlib_metadata-4.2.0-py3-none-any.whl (16 kB)
Collecting mccabe<0.8.0,>=0.7.0
  Downloading mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/site-packages (from importlib-metadata->transformers<4.21.0,>=4.18.0->autogluon.multimodal==0.5.2->autogluon) (3.6.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.7/site-packages (from jinja2->distributed<=2021.11.2,>=2021.09.1->autogluon.core[all]==0.5.2->autogluon) (2.0.1)
Collecting pkgutil-resolve-name>=1.3.10
  Downloading pkgutil_resolve_name-1.3.10-py3-none-any.whl (4.7 kB)
Collecting importlib-resources>=1.4.0
  Downloading importlib_resources-5.10.0-py3-none-any.whl (34 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
  Downloading pyrsistent-0.18.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (117 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.1/117.1 kB 165.5 MB/s eta 0:00:00
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/site-packages (from plotly->catboost<1.1,>=1.0->autogluon.tabular[all]==0.5.2->autogluon) (8.0.1)
Requirement already satisfied: colorama in /usr/local/lib/python3.7/site-packages (from sacrebleu->autogluon-contrib-nlp==0.0.1b20220208->autogluon.text==0.5.2->autogluon) (0.4.3)
Collecting lxml
  Downloading lxml-4.9.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (6.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.4/6.4 MB 165.8 MB/s eta 0:00:00a 0:00:01
Collecting virtualenv
  Downloading virtualenv-20.16.4-py3-none-any.whl (8.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 151.8 MB/s eta 0:00:00a 0:00:01
  Downloading virtualenv-20.16.3-py2.py3-none-any.whl (8.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 158.9 MB/s eta 0:00:00a 0:00:01
  Downloading virtualenv-20.16.2-py2.py3-none-any.whl (8.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 164.5 MB/s eta 0:00:00a 0:00:01
Collecting platformdirs<3,>=2
  Downloading platformdirs-2.5.2-py3-none-any.whl (14 kB)
Collecting distlib<1,>=0.3.1
  Downloading distlib-0.3.6-py2.py3-none-any.whl (468 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 468.5/468.5 kB 205.3 MB/s eta 0:00:00
Collecting pymeeus<=1,>=0.3.13
  Downloading PyMeeus-0.5.11.tar.gz (5.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.4/5.4 MB 184.6 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting cachetools<6.0,>=2.0.0
  Downloading cachetools-5.2.0-py3-none-any.whl (9.3 kB)
Collecting pyasn1-modules>=0.2.1
  Downloading pyasn1_modules-0.2.8-py2.py3-none-any.whl (155 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.3/155.3 kB 184.3 MB/s eta 0:00:00
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.7/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning<1.7.0,>=1.6.0->autogluon.multimodal==0.5.2->autogluon) (4.7.2)
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Collecting markdown>=2.6.8
  Downloading Markdown-3.4-py3-none-any.whl (93 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93.3/93.3 kB 157.6 MB/s eta 0:00:00
  Downloading Markdown-3.3.7-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.8/97.8 kB 168.3 MB/s eta 0:00:00
  Downloading Markdown-3.3.6-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.8/97.8 kB 168.8 MB/s eta 0:00:00
  Downloading Markdown-3.3.4-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.6/97.6 kB 159.5 MB/s eta 0:00:00
Collecting confection<1.0.0,>=0.0.1
  Downloading confection-0.0.3-py3-none-any.whl (32 kB)
Collecting blis<0.8.0,>=0.7.8
  Downloading blis-0.7.9-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.2/10.2 MB 162.1 MB/s eta 0:00:00a 0:00:01
Collecting yarl<2.0,>=1.0
  Downloading yarl-1.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 231.3/231.3 kB 189.2 MB/s eta 0:00:00
Collecting asynctest==0.13.0
  Downloading asynctest-0.13.0-py3-none-any.whl (26 kB)
Collecting multidict<7.0,>=4.5
  Downloading multidict-6.0.2-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (94 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94.8/94.8 kB 156.5 MB/s eta 0:00:00
Collecting charset-normalizer<3.0,>=2.0
  Downloading charset_normalizer-2.1.1-py3-none-any.whl (39 kB)
Collecting async-timeout<5.0,>=4.0.0a3
  Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.2.0->pytorch-lightning<1.7.0,>=1.6.0->autogluon.multimodal==0.5.2->autogluon) (0.4.8)
Collecting oauthlib>=3.0.0
  Downloading oauthlib-3.2.2-py3-none-any.whl (151 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 151.7/151.7 kB 184.9 MB/s eta 0:00:00
Building wheels for collected packages: fairscale, antlr4-python3-runtime, sacremoses, contextvars, future, pymeeus
  Building wheel for fairscale (pyproject.toml) ... done
  Created wheel for fairscale: filename=fairscale-0.4.6-py3-none-any.whl size=307225 sha256=46a37bbe98045b798d5241f86d0d509641a8c9f0ab4d9da1d562e550776a86ca
  Stored in directory: /tmp/pip-ephem-wheel-cache-jwvr3ikj/wheels/0b/8c/fa/a9e102632bcb86e919561cf25ca1e0dd2ec67476f3a5544653
  Building wheel for antlr4-python3-runtime (setup.py) ... done
  Created wheel for antlr4-python3-runtime: filename=antlr4_python3_runtime-4.8-py3-none-any.whl size=141210 sha256=4df6d785de039497734234d1d0aeecbd195bc00b06345cf6cea90dc7f52c70c7
  Stored in directory: /tmp/pip-ephem-wheel-cache-jwvr3ikj/wheels/c9/ef/75/1b8c6588a8a8a15d5a9136608a9d65172a226577e7ae89da31
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.53-py3-none-any.whl size=895241 sha256=96cf58e0a8f1c80589e55abd69e9d2f1306887e101a1e89e6c719bd1ae1dab4e
  Stored in directory: /tmp/pip-ephem-wheel-cache-jwvr3ikj/wheels/5b/e0/77/05245143a5b31f65af6a21f7afd3219e9fa4896f918af45677
  Building wheel for contextvars (setup.py) ... done
  Created wheel for contextvars: filename=contextvars-2.4-py3-none-any.whl size=7664 sha256=eb553d92b3ea8cef289dc634b1d9d673beef50154818dd621fdca42f40409f86
  Stored in directory: /tmp/pip-ephem-wheel-cache-jwvr3ikj/wheels/1b/4f/f6/2cf0b56beceeb4a516c29f1a061522603b2db256b1c9930fee
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=08100d14b25e6b5583691164f4315b4f706dd1cf88827a6b0a913821ee23f32f
  Stored in directory: /tmp/pip-ephem-wheel-cache-jwvr3ikj/wheels/3e/3c/b4/7132d27620dd551cf00823f798a7190e7320ae7ffb71d1e989
  Building wheel for pymeeus (setup.py) ... done
  Created wheel for pymeeus: filename=PyMeeus-0.5.11-py3-none-any.whl size=730971 sha256=38b9660916c4fbe80e43f62f75c113ec6af0a63c9fd029398cb93a5b97ffaada
  Stored in directory: /tmp/pip-ephem-wheel-cache-jwvr3ikj/wheels/bc/17/d4/0095e29d942940d5653b55f8503c4940e1fad226352c98c0d8
Successfully built fairscale antlr4-python3-runtime sacremoses contextvars future pymeeus
Installing collected packages: wasabi, typish, tokenizers, tensorboard-plugin-wit, sortedcontainers, sentencepiece, pymeeus, py4j, msgpack, korean-lunar-calendar, heapdict, distlib, cymem, antlr4-python3-runtime, zict, yacs, wrapt, typing-extensions, tqdm, toolz, tensorboard-data-server, tblib, spacy-loggers, spacy-legacy, smart-open, regex, pyrsistent, pyflakes, pyDeprecate, pycodestyle, pyasn1-modules, protobuf, platformdirs, pkgutil-resolve-name, Pillow, omegaconf, oauthlib, numpy, murmurhash, multidict, mccabe, lxml, locket, langcodes, importlib-resources, hijri-converter, grpcio, future, frozenlist, filelock, fastprogress, convertdate, charset-normalizer, cachetools, autocfg, asynctest, absl-py, yarl, torch, tifffile, tensorboardX, scipy, sacrebleu, requests-oauthlib, PyWavelets, pydantic, preshed, patsy, partd, nptyping, importlib-metadata, immutables, holidays, google-auth, fastcore, deprecated, catalogue, blis, async-timeout, aiosignal, xgboost, virtualenv, torchvision, torchtext, torchmetrics, statsmodels, srsly, scikit-image, nlpaug, markdown, jsonschema, hyperopt, huggingface-hub, google-auth-oauthlib, flake8, fastdownload, fairscale, dask, contextvars, click, aiohttp, typer, transformers, timm, tensorboard, sktime, sacremoses, ray, pytorch-metric-learning, pmdarima, nltk, lightgbm, gluonts, gluoncv, distributed, confection, catboost, thinc, tbats, pytorch-lightning, pathy, autogluon-contrib-nlp, autogluon.common, spacy, autogluon.features, autogluon.core, fastai, autogluon.vision, autogluon.timeseries, autogluon.tabular, autogluon.multimodal, autogluon.text, autogluon
  Attempting uninstall: typing-extensions
    Found existing installation: typing_extensions 4.0.1
    Uninstalling typing_extensions-4.0.1:
      Successfully uninstalled typing_extensions-4.0.1
  Attempting uninstall: tqdm
    Found existing installation: tqdm 4.39.0
    Uninstalling tqdm-4.39.0:
      Successfully uninstalled tqdm-4.39.0
  Attempting uninstall: protobuf
    Found existing installation: protobuf 3.19.1
    Uninstalling protobuf-3.19.1:
      Successfully uninstalled protobuf-3.19.1
  Attempting uninstall: Pillow
    Found existing installation: Pillow 8.4.0
    Uninstalling Pillow-8.4.0:
      Successfully uninstalled Pillow-8.4.0
  Attempting uninstall: numpy
    Found existing installation: numpy 1.19.1
    Uninstalling numpy-1.19.1:
      Successfully uninstalled numpy-1.19.1
  Attempting uninstall: scipy
    Found existing installation: scipy 1.4.1
    Uninstalling scipy-1.4.1:
      Successfully uninstalled scipy-1.4.1
  Attempting uninstall: importlib-metadata
    Found existing installation: importlib-metadata 4.8.2
    Uninstalling importlib-metadata-4.8.2:
      Successfully uninstalled importlib-metadata-4.8.2
  Attempting uninstall: gluoncv
    Found existing installation: gluoncv 0.8.0
    Uninstalling gluoncv-0.8.0:
      Successfully uninstalled gluoncv-0.8.0
Successfully installed Pillow-9.0.1 PyWavelets-1.3.0 absl-py-1.3.0 aiohttp-3.8.3 aiosignal-1.2.0 antlr4-python3-runtime-4.8 async-timeout-4.0.2 asynctest-0.13.0 autocfg-0.0.8 autogluon-0.5.2 autogluon-contrib-nlp-0.0.1b20220208 autogluon.common-0.5.2 autogluon.core-0.5.2 autogluon.features-0.5.2 autogluon.multimodal-0.5.2 autogluon.tabular-0.5.2 autogluon.text-0.5.2 autogluon.timeseries-0.5.2 autogluon.vision-0.5.2 blis-0.7.9 cachetools-5.2.0 catalogue-2.0.8 catboost-1.0.6 charset-normalizer-2.1.1 click-8.0.4 confection-0.0.3 contextvars-2.4 convertdate-2.4.0 cymem-2.0.7 dask-2021.11.2 deprecated-1.2.13 distlib-0.3.6 distributed-2021.11.2 fairscale-0.4.6 fastai-2.7.9 fastcore-1.5.27 fastdownload-0.0.7 fastprogress-1.0.3 filelock-3.8.0 flake8-5.0.4 frozenlist-1.3.1 future-0.18.2 gluoncv-0.10.5.post0 gluonts-0.9.9 google-auth-2.13.0 google-auth-oauthlib-0.4.6 grpcio-1.43.0 heapdict-1.0.1 hijri-converter-2.2.4 holidays-0.16 huggingface-hub-0.10.1 hyperopt-0.2.7 immutables-0.19 importlib-metadata-4.2.0 importlib-resources-5.10.0 jsonschema-4.16.0 korean-lunar-calendar-0.3.1 langcodes-3.3.0 lightgbm-3.3.3 locket-1.0.0 lxml-4.9.1 markdown-3.3.4 mccabe-0.7.0 msgpack-1.0.4 multidict-6.0.2 murmurhash-1.0.9 nlpaug-1.1.10 nltk-3.7 nptyping-1.4.4 numpy-1.21.6 oauthlib-3.2.2 omegaconf-2.1.2 partd-1.3.0 pathy-0.6.2 patsy-0.5.3 pkgutil-resolve-name-1.3.10 platformdirs-2.5.2 pmdarima-1.8.5 preshed-3.0.8 protobuf-3.18.1 py4j-0.10.9.7 pyDeprecate-0.3.2 pyasn1-modules-0.2.8 pycodestyle-2.9.1 pydantic-1.10.2 pyflakes-2.5.0 pymeeus-0.5.11 pyrsistent-0.18.1 pytorch-lightning-1.6.5 pytorch-metric-learning-1.3.2 ray-1.13.0 regex-2022.9.13 requests-oauthlib-1.3.1 sacrebleu-2.3.1 sacremoses-0.0.53 scikit-image-0.19.3 scipy-1.7.3 sentencepiece-0.1.95 sktime-0.11.4 smart-open-5.2.1 sortedcontainers-2.4.0 spacy-3.4.2 spacy-legacy-3.0.10 spacy-loggers-1.0.3 srsly-2.4.5 statsmodels-0.13.2 tbats-1.1.1 tblib-1.7.0 tensorboard-2.10.1 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorboardX-2.5.1 thinc-8.1.5 tifffile-2021.11.2 timm-0.5.4 tokenizers-0.12.1 toolz-0.12.0 torch-1.12.1 torchmetrics-0.7.3 torchtext-0.13.1 torchvision-0.13.1 tqdm-4.64.1 transformers-4.20.1 typer-0.4.2 typing-extensions-4.1.1 typish-1.9.3 virtualenv-20.16.2 wasabi-0.10.1 wrapt-1.14.1 xgboost-1.4.2 yacs-0.1.8 yarl-1.8.1 zict-2.2.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

Setup Kaggle API Key¶

In [3]:
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
In [4]:
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "asmaahanine"
kaggle_key = "3c216d230d35ca764c12c0b34e1a09b0"

# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
    f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))

Download and explore dataset¶

Go to the bike sharing demand competition and agree to the terms¶

kaggle6.png

In [3]:
!pip install kaggle
Collecting kaggle
  Using cached kaggle-1.5.12-py3-none-any.whl
Requirement already satisfied: certifi in /usr/local/lib/python3.7/site-packages (from kaggle) (2021.10.8)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/site-packages (from kaggle) (1.25.11)
Requirement already satisfied: python-dateutil in /usr/local/lib/python3.7/site-packages (from kaggle) (2.8.2)
Collecting python-slugify
  Using cached python_slugify-6.1.2-py2.py3-none-any.whl (9.4 kB)
Requirement already satisfied: requests in /usr/local/lib/python3.7/site-packages (from kaggle) (2.22.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.7/site-packages (from kaggle) (4.64.1)
Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.7/site-packages (from kaggle) (1.16.0)
Collecting text-unidecode>=1.3
  Using cached text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
Requirement already satisfied: idna<2.9,>=2.5 in /usr/local/lib/python3.7/site-packages (from requests->kaggle) (2.8)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/site-packages (from requests->kaggle) (3.0.4)
Installing collected packages: text-unidecode, python-slugify, kaggle
Successfully installed kaggle-1.5.12 python-slugify-6.1.2 text-unidecode-1.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
In [15]:
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
bike-sharing-demand.zip: Skipping, found more recently modified local copy (use --force to force download)
Archive:  bike-sharing-demand.zip
replace sampleSubmission.csv? [y]es, [n]o, [A]ll, [N]one, [r]ename: ^C
/bin/sh: 1: y: not found
In [4]:
import pandas as pd
from autogluon.tabular import TabularPredictor
import seaborn as sns
/usr/local/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
In [5]:
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv("train.csv")
# Parsing the datetime column
train.loc[:, "datetime"] = pd.to_datetime(train.loc[:, "datetime"])
train.head()
Out[5]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0 3 13 16
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0 8 32 40
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0 5 27 32
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0 3 10 13
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0 0 1 1
In [6]:
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
train.describe()
Out[6]:
season holiday workingday weather temp atemp humidity windspeed casual registered count
count 10886.000000 10886.000000 10886.000000 10886.000000 10886.00000 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000 10886.000000
mean 2.506614 0.028569 0.680875 1.418427 20.23086 23.655084 61.886460 12.799395 36.021955 155.552177 191.574132
std 1.116174 0.166599 0.466159 0.633839 7.79159 8.474601 19.245033 8.164537 49.960477 151.039033 181.144454
min 1.000000 0.000000 0.000000 1.000000 0.82000 0.760000 0.000000 0.000000 0.000000 0.000000 1.000000
25% 2.000000 0.000000 0.000000 1.000000 13.94000 16.665000 47.000000 7.001500 4.000000 36.000000 42.000000
50% 3.000000 0.000000 1.000000 1.000000 20.50000 24.240000 62.000000 12.998000 17.000000 118.000000 145.000000
75% 4.000000 0.000000 1.000000 2.000000 26.24000 31.060000 77.000000 16.997900 49.000000 222.000000 284.000000
max 4.000000 1.000000 1.000000 4.000000 41.00000 45.455000 100.000000 56.996900 367.000000 886.000000 977.000000

We can notice that :

  • None of the features is centered, we will have to standardize them in the following sections.
  • The season, holiday, working day and weather columns are encoded as integers. We will need to change the encoding in the following sections as well.
  • The datetime column is missing a lot of data.
In [9]:
# Printing informations about the dataset
train.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10886 entries, 0 to 10885
Data columns (total 12 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   datetime    10886 non-null  datetime64[ns]
 1   season      10886 non-null  int64         
 2   holiday     10886 non-null  int64         
 3   workingday  10886 non-null  int64         
 4   weather     10886 non-null  int64         
 5   temp        10886 non-null  float64       
 6   atemp       10886 non-null  float64       
 7   humidity    10886 non-null  int64         
 8   windspeed   10886 non-null  float64       
 9   casual      10886 non-null  int64         
 10  registered  10886 non-null  int64         
 11  count       10886 non-null  int64         
dtypes: datetime64[ns](1), float64(3), int64(8)
memory usage: 1020.7 KB
In [8]:
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv("test.csv")
# Parsing the datetime column
test.loc[:, "datetime"] = pd.to_datetime(test.loc[:, "datetime"])

test.head()
Out[8]:
datetime season holiday workingday weather temp atemp humidity windspeed
0 2011-01-20 00:00:00 1 0 1 1 10.66 11.365 56 26.0027
1 2011-01-20 01:00:00 1 0 1 1 10.66 13.635 56 0.0000
2 2011-01-20 02:00:00 1 0 1 1 10.66 13.635 56 0.0000
3 2011-01-20 03:00:00 1 0 1 1 10.66 12.880 56 11.0014
4 2011-01-20 04:00:00 1 0 1 1 10.66 12.880 56 11.0014
In [20]:
# Same thing as train and test dataset
submission = pd.read_csv("sampleSubmission.csv")
submission.head()
Out[20]:
datetime count
0 2011-01-20 00:00:00 0
1 2011-01-20 01:00:00 0
2 2011-01-20 02:00:00 0
3 2011-01-20 03:00:00 0
4 2011-01-20 04:00:00 0

Step 3: Train a model using AutoGluon’s Tabular Prediction¶

Requirements:

  • We are prediting count, so it is the label we are setting.
  • Ignore casual and registered columns as they are also not present in the test dataset.
  • Use the root_mean_squared_error as the metric to use for evaluation.
  • Set a time limit of 10 minutes (600 seconds).
  • Use the preset best_quality to focus on creating the best model.
In [51]:
predictor = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(
    train_data=train.loc[:, train.columns.difference(["casual","registered"])], time_limit=600, presets="best_quality"
)
No path specified. Models will be saved in: "AutogluonModels/ag-20221019_194002/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221019_194002/"
AutoGluon Version:  0.5.2
Python Version:     3.7.10
Operating System:   Linux
Train Data Rows:    10886
Train Data Columns: 9
Label Column: count
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    2049.17 MB
	Train Data (Original)  Memory Usage: 0.78 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting DatetimeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('datetime', []) : 1 | ['datetime']
		('float', [])    : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])      : 5 | ['holiday', 'humidity', 'season', 'weather', 'workingday']
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])                  : 3 | ['humidity', 'season', 'weather']
		('int', ['bool'])            : 2 | ['holiday', 'workingday']
		('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
	0.5s = Fit runtime
	9 features in original data used to generate 13 features in processed data.
	Train Data (Processed) Memory Usage: 0.98 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.57s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.52s of the 599.41s of remaining time.
	-101.5462	 = Validation score   (-root_mean_squared_error)
	0.05s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.1s of the 598.99s of remaining time.
	-84.1251	 = Validation score   (-root_mean_squared_error)
	0.05s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 398.7s of the 598.59s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-131.4127	 = Validation score   (-root_mean_squared_error)
	73.17s	 = Training   runtime
	8.61s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 318.82s of the 518.71s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-131.0484	 = Validation score   (-root_mean_squared_error)
	27.7s	 = Training   runtime
	1.28s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 287.48s of the 487.37s of remaining time.
	-116.6324	 = Validation score   (-root_mean_squared_error)
	11.37s	 = Training   runtime
	0.53s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 272.83s of the 472.72s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-130.6008	 = Validation score   (-root_mean_squared_error)
	194.46s	 = Training   runtime
	0.13s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 75.19s of the 275.08s of remaining time.
	-124.4967	 = Validation score   (-root_mean_squared_error)
	4.84s	 = Training   runtime
	0.52s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 67.18s of the 267.07s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-137.175	 = Validation score   (-root_mean_squared_error)
	74.35s	 = Training   runtime
	0.42s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 189.46s of remaining time.
	-84.1251	 = Validation score   (-root_mean_squared_error)
	0.69s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 188.69s of the 188.67s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-60.5701	 = Validation score   (-root_mean_squared_error)
	48.96s	 = Training   runtime
	2.9s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 136.19s of the 136.18s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-55.1061	 = Validation score   (-root_mean_squared_error)
	24.2s	 = Training   runtime
	0.23s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 109.0s of the 108.98s of remaining time.
	-53.2786	 = Validation score   (-root_mean_squared_error)
	26.37s	 = Training   runtime
	0.59s	 = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 79.6s of the 79.58s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-55.633	 = Validation score   (-root_mean_squared_error)
	71.56s	 = Training   runtime
	0.09s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 5.03s of the 5.01s of remaining time.
	-53.776	 = Validation score   (-root_mean_squared_error)
	9.1s	 = Training   runtime
	0.59s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -7.32s of remaining time.
	-52.7555	 = Validation score   (-root_mean_squared_error)
	0.51s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 608.04s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221019_194002/")

Review AutoGluon's training run with ranking of models that did the best.¶

In [11]:
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -52.729069      14.257364  445.129123                0.001256           0.433843            3       True         15
1   RandomForestMSE_BAG_L2  -53.324848      13.405554  412.855981                0.590444          25.980774            2       True         12
2     ExtraTreesMSE_BAG_L2  -53.717484      13.400470  395.048464                0.585360           8.173257            2       True         14
3          LightGBM_BAG_L2  -54.975211      13.080305  410.541249                0.265195          23.666042            2       True         11
4          CatBoost_BAG_L2  -55.578418      12.894655  450.923761                0.079545          64.048554            2       True         13
5        LightGBMXT_BAG_L2  -60.497960      16.269038  438.927994                3.453928          52.052787            2       True         10
6    KNeighborsDist_BAG_L1  -84.125061       0.103696    0.033458                0.103696           0.033458            1       True          2
7      WeightedEnsemble_L2  -84.125061       0.104901    0.601218                0.001205           0.567760            2       True          9
8    KNeighborsUnif_BAG_L1 -101.546199       0.103069    0.114331                0.103069           0.114331            1       True          1
9   RandomForestMSE_BAG_L1 -116.632421       0.537841   10.762428                0.537841          10.762428            1       True          5
10    ExtraTreesMSE_BAG_L1 -124.496689       0.516548    4.834809                0.516548           4.834809            1       True          7
11         CatBoost_BAG_L1 -130.600759       0.103025  192.814319                0.103025         192.814319            1       True          6
12         LightGBM_BAG_L1 -131.048402       1.525897   29.324282                1.525897          29.324282            1       True          4
13       LightGBMXT_BAG_L1 -131.412741       9.414168   76.680547                9.414168          76.680547            1       True          3
14  NeuralNetFastAI_BAG_L1 -137.554677       0.510866   72.311033                0.510866          72.311033            1       True          8
Number of models trained: 15
Types of models trained:
{'StackerEnsembleModel_XT', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_NNFastAiTabular', 'WeightedEnsembleModel', 'StackerEnsembleModel_KNN', 'StackerEnsembleModel_CatBoost'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
('int', [])                  : 3 | ['humidity', 'season', 'weather']
('int', ['bool'])            : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221019_181447/SummaryOfModels.html
*** End of fit() summary ***
Out[11]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
  'KNeighborsDist_BAG_L1': -84.12506123181602,
  'LightGBMXT_BAG_L1': -131.41274077052907,
  'LightGBM_BAG_L1': -131.04840164127194,
  'RandomForestMSE_BAG_L1': -116.63242058947374,
  'CatBoost_BAG_L1': -130.6007588943428,
  'ExtraTreesMSE_BAG_L1': -124.49668948784444,
  'NeuralNetFastAI_BAG_L1': -137.55467740409978,
  'WeightedEnsemble_L2': -84.12506123181602,
  'LightGBMXT_BAG_L2': -60.4979596005761,
  'LightGBM_BAG_L2': -54.97521083134642,
  'RandomForestMSE_BAG_L2': -53.32484832178292,
  'CatBoost_BAG_L2': -55.5784178479148,
  'ExtraTreesMSE_BAG_L2': -53.717483530403854,
  'WeightedEnsemble_L3': -52.729068758216826},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/KNeighborsUnif_BAG_L1/',
  'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/KNeighborsDist_BAG_L1/',
  'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/LightGBMXT_BAG_L1/',
  'LightGBM_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/LightGBM_BAG_L1/',
  'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/RandomForestMSE_BAG_L1/',
  'CatBoost_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/CatBoost_BAG_L1/',
  'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/ExtraTreesMSE_BAG_L1/',
  'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20221019_181447/models/NeuralNetFastAI_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20221019_181447/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20221019_181447/models/LightGBMXT_BAG_L2/',
  'LightGBM_BAG_L2': 'AutogluonModels/ag-20221019_181447/models/LightGBM_BAG_L2/',
  'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20221019_181447/models/RandomForestMSE_BAG_L2/',
  'CatBoost_BAG_L2': 'AutogluonModels/ag-20221019_181447/models/CatBoost_BAG_L2/',
  'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20221019_181447/models/ExtraTreesMSE_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20221019_181447/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.11433124542236328,
  'KNeighborsDist_BAG_L1': 0.03345799446105957,
  'LightGBMXT_BAG_L1': 76.68054699897766,
  'LightGBM_BAG_L1': 29.32428216934204,
  'RandomForestMSE_BAG_L1': 10.762428045272827,
  'CatBoost_BAG_L1': 192.8143186569214,
  'ExtraTreesMSE_BAG_L1': 4.834809064865112,
  'NeuralNetFastAI_BAG_L1': 72.31103277206421,
  'WeightedEnsemble_L2': 0.5677599906921387,
  'LightGBMXT_BAG_L2': 52.05278706550598,
  'LightGBM_BAG_L2': 23.66604208946228,
  'RandomForestMSE_BAG_L2': 25.98077416419983,
  'CatBoost_BAG_L2': 64.04855418205261,
  'ExtraTreesMSE_BAG_L2': 8.17325735092163,
  'WeightedEnsemble_L3': 0.43384289741516113},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.10306930541992188,
  'KNeighborsDist_BAG_L1': 0.10369586944580078,
  'LightGBMXT_BAG_L1': 9.414168119430542,
  'LightGBM_BAG_L1': 1.5258972644805908,
  'RandomForestMSE_BAG_L1': 0.5378408432006836,
  'CatBoost_BAG_L1': 0.10302495956420898,
  'ExtraTreesMSE_BAG_L1': 0.5165479183197021,
  'NeuralNetFastAI_BAG_L1': 0.5108659267425537,
  'WeightedEnsemble_L2': 0.0012049674987792969,
  'LightGBMXT_BAG_L2': 3.453927516937256,
  'LightGBM_BAG_L2': 0.2651948928833008,
  'RandomForestMSE_BAG_L2': 0.5904438495635986,
  'CatBoost_BAG_L2': 0.0795445442199707,
  'ExtraTreesMSE_BAG_L2': 0.5853595733642578,
  'WeightedEnsemble_L3': 0.0012555122375488281},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -52.729069      14.257364  445.129123   
 1   RandomForestMSE_BAG_L2  -53.324848      13.405554  412.855981   
 2     ExtraTreesMSE_BAG_L2  -53.717484      13.400470  395.048464   
 3          LightGBM_BAG_L2  -54.975211      13.080305  410.541249   
 4          CatBoost_BAG_L2  -55.578418      12.894655  450.923761   
 5        LightGBMXT_BAG_L2  -60.497960      16.269038  438.927994   
 6    KNeighborsDist_BAG_L1  -84.125061       0.103696    0.033458   
 7      WeightedEnsemble_L2  -84.125061       0.104901    0.601218   
 8    KNeighborsUnif_BAG_L1 -101.546199       0.103069    0.114331   
 9   RandomForestMSE_BAG_L1 -116.632421       0.537841   10.762428   
 10    ExtraTreesMSE_BAG_L1 -124.496689       0.516548    4.834809   
 11         CatBoost_BAG_L1 -130.600759       0.103025  192.814319   
 12         LightGBM_BAG_L1 -131.048402       1.525897   29.324282   
 13       LightGBMXT_BAG_L1 -131.412741       9.414168   76.680547   
 14  NeuralNetFastAI_BAG_L1 -137.554677       0.510866   72.311033   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.001256           0.433843            3       True   
 1                 0.590444          25.980774            2       True   
 2                 0.585360           8.173257            2       True   
 3                 0.265195          23.666042            2       True   
 4                 0.079545          64.048554            2       True   
 5                 3.453928          52.052787            2       True   
 6                 0.103696           0.033458            1       True   
 7                 0.001205           0.567760            2       True   
 8                 0.103069           0.114331            1       True   
 9                 0.537841          10.762428            1       True   
 10                0.516548           4.834809            1       True   
 11                0.103025         192.814319            1       True   
 12                1.525897          29.324282            1       True   
 13                9.414168          76.680547            1       True   
 14                0.510866          72.311033            1       True   
 
     fit_order  
 0          15  
 1          12  
 2          14  
 3          11  
 4          13  
 5          10  
 6           2  
 7           9  
 8           1  
 9           5  
 10          7  
 11          6  
 12          4  
 13          3  
 14          8  }

Let's plot training scores of the top performers of the tested models.

In [12]:
predictor.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
Out[12]:
<AxesSubplot:xlabel='model'>
In [13]:
test.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6493 entries, 0 to 6492
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   datetime    6493 non-null   datetime64[ns]
 1   season      6493 non-null   int64         
 2   holiday     6493 non-null   int64         
 3   workingday  6493 non-null   int64         
 4   weather     6493 non-null   int64         
 5   temp        6493 non-null   float64       
 6   atemp       6493 non-null   float64       
 7   humidity    6493 non-null   int64         
 8   windspeed   6493 non-null   float64       
dtypes: datetime64[ns](1), float64(3), int64(5)
memory usage: 456.7 KB

Create predictions from test dataset¶

In [14]:
predictions = predictor.predict(test)

NOTE: Kaggle will reject the submission if we don't set everything to be > 0.¶

In [15]:
# Describe the `predictions` series to see if there are any negative values
print("Predictions:  \n", predictions)
Predictions:  
 0        24.271076
1        40.670219
2        44.759418
3        48.270988
4        51.002129
           ...    
6488    158.364975
6489    158.364975
6490    154.558319
6491    147.399673
6492    153.469833
Name: count, Length: 6493, dtype: float32
In [17]:
predictions.describe()
Out[17]:
count    6493.000000
mean      100.682121
std        90.388634
min         3.041300
25%        20.355358
50%        62.666965
75%       170.241760
max       363.189880
Name: count, dtype: float64
In [18]:
# How many negative values do we have?
print("Negative predictions are : \n", predictions[predictions<0])
Negative predictions are : 
 Series([], Name: count, dtype: float32)

We have no negative values.

In [19]:
# Set them to zero
predictions[predictions<0] = 0

Set predictions to submission dataframe, save, and submit¶

In [20]:
submission["count"] = predictions
submission.to_csv("submission.csv", index=False)
In [21]:
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "3nd raw submission"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 365kB/s]
Successfully submitted to Bike Sharing Demand

View submission via the command line or in the web browser under the competition's page - My Submissions¶

In [22]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description           status    publicScore  privateScore  
---------------------------  -------------------  --------------------  --------  -----------  ------------  
submission.csv               2022-10-19 18:30:41  3nd raw submission    complete  1.80895      1.80895       
submission_new_features.csv  2022-10-16 22:46:40  new features          complete  1.80152      1.80152       
submission.csv               2022-10-16 20:05:41  2nd raw submission    complete  1.80406      1.80406       
submission.csv               2022-10-16 20:05:24  first raw submission  complete  1.80406      1.80406       

Initial score of 1.80406¶

Step 4: Exploratory Data Analysis and Creating an additional feature¶

  • Any additional feature will do, but a great suggestion would be to separate out the datetime into hour, day, or month parts.
In [11]:
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
train.hist(ax = ax)
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:5: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
  """
Out[11]:
array([[<AxesSubplot:title={'center':'datetime'}>,
        <AxesSubplot:title={'center':'season'}>,
        <AxesSubplot:title={'center':'holiday'}>],
       [<AxesSubplot:title={'center':'workingday'}>,
        <AxesSubplot:title={'center':'weather'}>,
        <AxesSubplot:title={'center':'temp'}>],
       [<AxesSubplot:title={'center':'atemp'}>,
        <AxesSubplot:title={'center':'humidity'}>,
        <AxesSubplot:title={'center':'windspeed'}>],
       [<AxesSubplot:title={'center':'casual'}>,
        <AxesSubplot:title={'center':'registered'}>,
        <AxesSubplot:title={'center':'count'}>]], dtype=object)
In [24]:
train.corr()
Out[24]:
season holiday workingday weather temp atemp humidity windspeed casual registered count
season 1.000000 0.029368 -0.008126 0.008879 0.258689 0.264744 0.190610 -0.147121 0.096758 0.164011 0.163439
holiday 0.029368 1.000000 -0.250491 -0.007074 0.000295 -0.005215 0.001929 0.008409 0.043799 -0.020956 -0.005393
workingday -0.008126 -0.250491 1.000000 0.033772 0.029966 0.024660 -0.010880 0.013373 -0.319111 0.119460 0.011594
weather 0.008879 -0.007074 0.033772 1.000000 -0.055035 -0.055376 0.406244 0.007261 -0.135918 -0.109340 -0.128655
temp 0.258689 0.000295 0.029966 -0.055035 1.000000 0.984948 -0.064949 -0.017852 0.467097 0.318571 0.394454
atemp 0.264744 -0.005215 0.024660 -0.055376 0.984948 1.000000 -0.043536 -0.057473 0.462067 0.314635 0.389784
humidity 0.190610 0.001929 -0.010880 0.406244 -0.064949 -0.043536 1.000000 -0.318607 -0.348187 -0.265458 -0.317371
windspeed -0.147121 0.008409 0.013373 0.007261 -0.017852 -0.057473 -0.318607 1.000000 0.092276 0.091052 0.101369
casual 0.096758 0.043799 -0.319111 -0.135918 0.467097 0.462067 -0.348187 0.092276 1.000000 0.497250 0.690414
registered 0.164011 -0.020956 0.119460 -0.109340 0.318571 0.314635 -0.265458 0.091052 0.497250 1.000000 0.970948
count 0.163439 -0.005393 0.011594 -0.128655 0.394454 0.389784 -0.317371 0.101369 0.690414 0.970948 1.000000
In [12]:
sns.clustermap(train.corr())
Out[12]:
<seaborn.matrix.ClusterGrid at 0x7f9d6d0a4110>

We can notice from the correlation matrix that the temp and atemp parameters are highly correlated. This makes sense, and we may want to only keep one parameter as keeping both would imply duplicating information. Other correlations are not as high.

In [14]:
sns.pairplot(train)
Out[14]:
<seaborn.axisgrid.PairGrid at 0x7f9d5f56af90>

We can notice from the pairplot a few things :

  • During the fall season, the use of bikes is higher than other seasons. The least amount of rentals have been recorded during the summer.
  • More rentals happen during working days than holidays or other non working days, this can be explained by the fact that people rent bikes to go to work.
  • More rentals are recorded when the sky is clear or mildly cloudy.
  • Humidity is not a very determining factor when it comes to bike rentals, as the count of rentals doesn't vary much as a function of humidity.
  • Rentals tend to be higher for felt temperatures and recorded temperatures between 25 and 35 degrees.
  • For windspeeds above 40 we have a low count of rentals. High amounts of rentals have been recorded for windspeeds under 30m/s.
  • There is a high correlation between registered rentals and total number of rentals, as it is more easy to track registered rentals in comparison to casual rentals.
In [20]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
train.sort_values('datetime').plot.line('datetime','count',ax = ax)
Out[20]:
<AxesSubplot:xlabel='datetime'>

The plot of counted rentals timeseries shows that the number of rentals increased from 2011 to 2013. We can also notice that within the same season, the number of rentals varies depending on the month. It would be interesting to add time features extracted from the dateime to the dataset to gain more insight on rental count variation for different granularity levels of time.

In [9]:
# create a new feature
# Separating datetime into year, month and day features
train_new = train
train_new["year"] = train_new.datetime.dt.year
train_new["month"] = train_new.datetime.dt.month
train_new["day"] = train_new.datetime.dt.day
train_new["hour"] = train_new.datetime.dt.hour

test_new = test
test_new["year"] = test_new.datetime.dt.year
test_new["month"] = test_new.datetime.dt.month
test_new["day"] = test_new.datetime.dt.day
test_new["hour"] = test_new.datetime.dt.hour
In [24]:
import matplotlib.pyplot as plt


fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 8))
train.plot(ax=axes[0, 0], x="year", y="count", kind="scatter")
train.plot(ax=axes[0, 1], x="month", y="count", kind="scatter")
train.plot(ax=axes[1, 0], x="day", y="count", kind="scatter")
train.plot(ax=axes[1, 1], x="hour", y="count", kind="scatter")
Out[24]:
<AxesSubplot:xlabel='hour', ylabel='count'>

We can notice that more bikes are rented at 08am and 5pm, which correspond to the start and end of work day hours.

In [22]:
# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
train_new.hist(ax = ax)
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:5: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
  """
Out[22]:
array([[<AxesSubplot:title={'center':'datetime'}>,
        <AxesSubplot:title={'center':'season'}>,
        <AxesSubplot:title={'center':'holiday'}>,
        <AxesSubplot:title={'center':'workingday'}>],
       [<AxesSubplot:title={'center':'weather'}>,
        <AxesSubplot:title={'center':'temp'}>,
        <AxesSubplot:title={'center':'atemp'}>,
        <AxesSubplot:title={'center':'humidity'}>],
       [<AxesSubplot:title={'center':'windspeed'}>,
        <AxesSubplot:title={'center':'casual'}>,
        <AxesSubplot:title={'center':'registered'}>,
        <AxesSubplot:title={'center':'count'}>],
       [<AxesSubplot:title={'center':'year'}>,
        <AxesSubplot:title={'center':'month'}>,
        <AxesSubplot:title={'center':'day'}>,
        <AxesSubplot:title={'center':'hour'}>]], dtype=object)

Make category types for these so models know they are not just numbers¶

  • AutoGluon originally sees these as ints, but in reality they are int representations of a category.
  • Setting the dtype to category will classify these as categories in AutoGluon.
In [9]:
train_new["season"] = train_new.season.astype('category')
train_new["weather"] = train_new.weather.astype('category')
train_new["holiday"] = train_new.holiday.astype('category')
train_new["workingday"] = train_new.workingday.astype('category')

test_new["season"] = test_new.season.astype('category')
test_new["weather"] = test_new.weather.astype('category')
test_new["holiday"] = test_new.holiday.astype('category')
test_new["workingday"] = test_new.workingday.astype('category')
In [55]:
# View are new feature
train_new.head()
Out[55]:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count year month day hour
0 2011-01-01 00:00:00 1 0 0 1 9.84 14.395 81 0.0 3 13 16 2011 1 1 0
1 2011-01-01 01:00:00 1 0 0 1 9.02 13.635 80 0.0 8 32 40 2011 1 1 1
2 2011-01-01 02:00:00 1 0 0 1 9.02 13.635 80 0.0 5 27 32 2011 1 1 2
3 2011-01-01 03:00:00 1 0 0 1 9.84 14.395 75 0.0 3 10 13 2011 1 1 3
4 2011-01-01 04:00:00 1 0 0 1 9.84 14.395 75 0.0 0 1 1 2011 1 1 4
In [56]:
# View histogram of all features again now with the hour feature
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
train_new.hist(ax = ax)
/usr/local/lib/python3.7/site-packages/ipykernel_launcher.py:5: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
  """
Out[56]:
array([[<AxesSubplot:title={'center':'datetime'}>,
        <AxesSubplot:title={'center':'temp'}>,
        <AxesSubplot:title={'center':'atemp'}>],
       [<AxesSubplot:title={'center':'humidity'}>,
        <AxesSubplot:title={'center':'windspeed'}>,
        <AxesSubplot:title={'center':'casual'}>],
       [<AxesSubplot:title={'center':'registered'}>,
        <AxesSubplot:title={'center':'count'}>,
        <AxesSubplot:title={'center':'year'}>],
       [<AxesSubplot:title={'center':'month'}>,
        <AxesSubplot:title={'center':'day'}>,
        <AxesSubplot:title={'center':'hour'}>]], dtype=object)

Step 5: Rerun the model with the same settings as before, just with more features¶

In [25]:
predictor_new_features = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(
    train_data=train_new.loc[:, train_new.columns.difference(["casual","registered"])], time_limit=600, presets="best_quality"
)
No path specified. Models will be saved in: "AutogluonModels/ag-20221023_205528/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221023_205528/"
AutoGluon Version:  0.5.2
Python Version:     3.7.10
Operating System:   Linux
Train Data Rows:    10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    1982.02 MB
	Train Data (Original)  Memory Usage: 1.13 MB (0.1% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting DatetimeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('datetime', []) : 1 | ['datetime']
		('float', [])    : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])      : 9 | ['day', 'holiday', 'hour', 'humidity', 'month', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])                  : 6 | ['day', 'hour', 'humidity', 'month', 'season', ...]
		('int', ['bool'])            : 3 | ['holiday', 'workingday', 'year']
		('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
	0.1s = Fit runtime
	13 features in original data used to generate 17 features in processed data.
	Train Data (Processed) Memory Usage: 1.25 MB (0.1% of available memory)
Data preprocessing and feature engineering runtime = 0.19s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.77s of the 599.8s of remaining time.
	-101.5462	 = Validation score   (-root_mean_squared_error)
	0.04s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.39s of the 599.42s of remaining time.
	-84.1251	 = Validation score   (-root_mean_squared_error)
	0.04s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.01s of the 599.04s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-34.4947	 = Validation score   (-root_mean_squared_error)
	91.94s	 = Training   runtime
	10.76s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 301.77s of the 501.8s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.9992	 = Validation score   (-root_mean_squared_error)
	45.93s	 = Training   runtime
	3.3s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 252.02s of the 452.05s of remaining time.
	-38.3986	 = Validation score   (-root_mean_squared_error)
	14.11s	 = Training   runtime
	0.58s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 234.83s of the 434.86s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.2593	 = Validation score   (-root_mean_squared_error)
	197.88s	 = Training   runtime
	0.15s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 33.93s of the 233.95s of remaining time.
	-38.4819	 = Validation score   (-root_mean_squared_error)
	6.2s	 = Training   runtime
	0.56s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 24.63s of the 224.66s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-102.6377	 = Validation score   (-root_mean_squared_error)
	44.0s	 = Training   runtime
	0.45s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 176.55s of remaining time.
	-31.9641	 = Validation score   (-root_mean_squared_error)
	0.51s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 175.97s of the 175.95s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-31.054	 = Validation score   (-root_mean_squared_error)
	27.65s	 = Training   runtime
	0.59s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 144.86s of the 144.84s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-30.748	 = Validation score   (-root_mean_squared_error)
	24.02s	 = Training   runtime
	0.22s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 117.68s of the 117.66s of remaining time.
	-31.7362	 = Validation score   (-root_mean_squared_error)
	31.06s	 = Training   runtime
	0.75s	 = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 83.49s of the 83.47s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-30.4666	 = Validation score   (-root_mean_squared_error)
	67.4s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 13.11s of the 13.09s of remaining time.
	-31.4382	 = Validation score   (-root_mean_squared_error)
	9.6s	 = Training   runtime
	0.62s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 0.39s of the 0.37s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	Time limit exceeded... Skipping NeuralNetFastAI_BAG_L2.
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -8.48s of remaining time.
	-30.2526	 = Validation score   (-root_mean_squared_error)
	0.32s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 609.0s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221023_205528/")
In [26]:
predictor_new_features.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -30.252623      17.556348  529.136375                0.000796           0.323466            3       True         15
1          CatBoost_BAG_L2  -30.466598      16.121886  467.535128                0.095946          67.399496            2       True         13
2          LightGBM_BAG_L2  -30.747974      16.246617  424.158355                0.220676          24.022724            2       True         11
3        LightGBMXT_BAG_L2  -31.053955      16.615151  427.785796                0.589211          27.650164            2       True         10
4     ExtraTreesMSE_BAG_L2  -31.438204      16.649720  409.740524                0.623779           9.604893            2       True         14
5   RandomForestMSE_BAG_L2  -31.736159      16.777495  431.193593                0.751554          31.057962            2       True         12
6      WeightedEnsemble_L2  -31.964149      14.909500  350.399838                0.000958           0.506435            2       True          9
7          CatBoost_BAG_L1  -33.259343       0.154472  197.879862                0.154472         197.879862            1       True          6
8          LightGBM_BAG_L1  -33.999247       3.303141   45.928752                3.303141          45.928752            1       True          4
9        LightGBMXT_BAG_L1  -34.494653      10.764218   91.938161               10.764218          91.938161            1       True          3
10  RandomForestMSE_BAG_L1  -38.398605       0.583024   14.109219                0.583024          14.109219            1       True          5
11    ExtraTreesMSE_BAG_L1  -38.481929       0.564917    6.199713                0.564917           6.199713            1       True          7
12   KNeighborsDist_BAG_L1  -84.125061       0.103687    0.037409                0.103687           0.037409            1       True          2
13   KNeighborsUnif_BAG_L1 -101.546199       0.102817    0.042574                0.102817           0.042574            1       True          1
14  NeuralNetFastAI_BAG_L1 -102.637717       0.449665   43.999941                0.449665          43.999941            1       True          8
Number of models trained: 15
Types of models trained:
{'StackerEnsembleModel_KNN', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_LGB'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
('int', [])                  : 6 | ['day', 'hour', 'humidity', 'month', 'season', ...]
('int', ['bool'])            : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221023_205528/SummaryOfModels.html
*** End of fit() summary ***
Out[26]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -101.54619908446061,
  'KNeighborsDist_BAG_L1': -84.12506123181602,
  'LightGBMXT_BAG_L1': -34.494653086618804,
  'LightGBM_BAG_L1': -33.99924669761093,
  'RandomForestMSE_BAG_L1': -38.39860545954656,
  'CatBoost_BAG_L1': -33.25934270773738,
  'ExtraTreesMSE_BAG_L1': -38.481929349341094,
  'NeuralNetFastAI_BAG_L1': -102.63771662405847,
  'WeightedEnsemble_L2': -31.964149052890754,
  'LightGBMXT_BAG_L2': -31.0539547053999,
  'LightGBM_BAG_L2': -30.747973719282207,
  'RandomForestMSE_BAG_L2': -31.736159000059434,
  'CatBoost_BAG_L2': -30.466598375481624,
  'ExtraTreesMSE_BAG_L2': -31.438203849948902,
  'WeightedEnsemble_L3': -30.25262269369684},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/KNeighborsUnif_BAG_L1/',
  'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/KNeighborsDist_BAG_L1/',
  'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/LightGBMXT_BAG_L1/',
  'LightGBM_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/LightGBM_BAG_L1/',
  'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/RandomForestMSE_BAG_L1/',
  'CatBoost_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/CatBoost_BAG_L1/',
  'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/ExtraTreesMSE_BAG_L1/',
  'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20221023_205528/models/NeuralNetFastAI_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20221023_205528/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20221023_205528/models/LightGBMXT_BAG_L2/',
  'LightGBM_BAG_L2': 'AutogluonModels/ag-20221023_205528/models/LightGBM_BAG_L2/',
  'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20221023_205528/models/RandomForestMSE_BAG_L2/',
  'CatBoost_BAG_L2': 'AutogluonModels/ag-20221023_205528/models/CatBoost_BAG_L2/',
  'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20221023_205528/models/ExtraTreesMSE_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20221023_205528/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.04257369041442871,
  'KNeighborsDist_BAG_L1': 0.03740859031677246,
  'LightGBMXT_BAG_L1': 91.93816137313843,
  'LightGBM_BAG_L1': 45.928752183914185,
  'RandomForestMSE_BAG_L1': 14.10921859741211,
  'CatBoost_BAG_L1': 197.8798623085022,
  'ExtraTreesMSE_BAG_L1': 6.199713468551636,
  'NeuralNetFastAI_BAG_L1': 43.99994111061096,
  'WeightedEnsemble_L2': 0.5064353942871094,
  'LightGBMXT_BAG_L2': 27.650164365768433,
  'LightGBM_BAG_L2': 24.02272391319275,
  'RandomForestMSE_BAG_L2': 31.05796194076538,
  'CatBoost_BAG_L2': 67.39949631690979,
  'ExtraTreesMSE_BAG_L2': 9.60489273071289,
  'WeightedEnsemble_L3': 0.32346630096435547},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.10281658172607422,
  'KNeighborsDist_BAG_L1': 0.10368657112121582,
  'LightGBMXT_BAG_L1': 10.764217853546143,
  'LightGBM_BAG_L1': 3.3031413555145264,
  'RandomForestMSE_BAG_L1': 0.583024263381958,
  'CatBoost_BAG_L1': 0.15447187423706055,
  'ExtraTreesMSE_BAG_L1': 0.5649173259735107,
  'NeuralNetFastAI_BAG_L1': 0.449664831161499,
  'WeightedEnsemble_L2': 0.0009582042694091797,
  'LightGBMXT_BAG_L2': 0.5892107486724854,
  'LightGBM_BAG_L2': 0.22067594528198242,
  'RandomForestMSE_BAG_L2': 0.7515542507171631,
  'CatBoost_BAG_L2': 0.09594583511352539,
  'ExtraTreesMSE_BAG_L2': 0.6237790584564209,
  'WeightedEnsemble_L3': 0.0007958412170410156},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -30.252623      17.556348  529.136375   
 1          CatBoost_BAG_L2  -30.466598      16.121886  467.535128   
 2          LightGBM_BAG_L2  -30.747974      16.246617  424.158355   
 3        LightGBMXT_BAG_L2  -31.053955      16.615151  427.785796   
 4     ExtraTreesMSE_BAG_L2  -31.438204      16.649720  409.740524   
 5   RandomForestMSE_BAG_L2  -31.736159      16.777495  431.193593   
 6      WeightedEnsemble_L2  -31.964149      14.909500  350.399838   
 7          CatBoost_BAG_L1  -33.259343       0.154472  197.879862   
 8          LightGBM_BAG_L1  -33.999247       3.303141   45.928752   
 9        LightGBMXT_BAG_L1  -34.494653      10.764218   91.938161   
 10  RandomForestMSE_BAG_L1  -38.398605       0.583024   14.109219   
 11    ExtraTreesMSE_BAG_L1  -38.481929       0.564917    6.199713   
 12   KNeighborsDist_BAG_L1  -84.125061       0.103687    0.037409   
 13   KNeighborsUnif_BAG_L1 -101.546199       0.102817    0.042574   
 14  NeuralNetFastAI_BAG_L1 -102.637717       0.449665   43.999941   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000796           0.323466            3       True   
 1                 0.095946          67.399496            2       True   
 2                 0.220676          24.022724            2       True   
 3                 0.589211          27.650164            2       True   
 4                 0.623779           9.604893            2       True   
 5                 0.751554          31.057962            2       True   
 6                 0.000958           0.506435            2       True   
 7                 0.154472         197.879862            1       True   
 8                 3.303141          45.928752            1       True   
 9                10.764218          91.938161            1       True   
 10                0.583024          14.109219            1       True   
 11                0.564917           6.199713            1       True   
 12                0.103687           0.037409            1       True   
 13                0.102817           0.042574            1       True   
 14                0.449665          43.999941            1       True   
 
     fit_order  
 0          15  
 1          13  
 2          11  
 3          10  
 4          14  
 5          12  
 6           9  
 7           6  
 8           4  
 9           3  
 10          5  
 11          7  
 12          2  
 13          1  
 14          8  }
In [27]:
performance = predictor_new_features.evaluate(test_new)
print("The performance indicators are : \n", performance)
/usr/local/lib/python3.7/site-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -196.38650113235116
	Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -196.38650113235116,
    "mean_squared_error": -38567.65782700696,
    "mean_absolute_error": -149.05909678496474,
    "r2": 0.0,
    "pearsonr": NaN,
    "median_absolute_error": -116.65669250488281
}
The performance indicators are : 
 {'root_mean_squared_error': -196.38650113235116, 'mean_squared_error': -38567.65782700696, 'mean_absolute_error': -149.05909678496474, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -116.65669250488281}
In [58]:
# Remember to set all negative values to zero
predictions_new_features = predictor.predict(test_new)
In [41]:
print('Negative predictions are :', predictions_new_features[predictions_new_features<0])
Negative predictions are : Series([], Name: count, dtype: float32)
In [59]:
# Same submitting predictions
submission_new_features = submission
submission_new_features["count"] = predictions_new_features
submission_new_features.to_csv("submission_new_features.csv", index=False)
In [60]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features + set weather, holiday, season, workingday "
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 290kB/s]
Successfully submitted to Bike Sharing Demand
In [61]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                                               status    publicScore  privateScore  
---------------------------  -------------------  --------------------------------------------------------  --------  -----------  ------------  
submission_new_features.csv  2022-10-19 20:23:34  new features + set weather, holiday, season, workingday   complete  1.80119      1.80119       
submission_new_features.csv  2022-10-19 19:25:25  new features + set weather, holiday, season, workingday   complete  1.80895      1.80895       
submission.csv               2022-10-19 18:30:41  3nd raw submission                                        complete  1.80895      1.80895       
submission_new_features.csv  2022-10-16 22:46:40  new features                                              complete  1.80152      1.80152       

New Score of 1.80119¶

Removing datetime variable¶

In [11]:
predictor_wo_datetime = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(
    train_data=train_new.loc[:, train_new.columns.difference(["datetime","casual","registered"])], time_limit=600, presets="best_quality"
)
No path specified. Models will be saved in: "AutogluonModels/ag-20221023_202702/"
Presets specified: ['best_quality']
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221023_202702/"
AutoGluon Version:  0.5.2
Python Version:     3.7.10
Operating System:   Linux
Train Data Rows:    10886
Train Data Columns: 12
Label Column: count
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    2982.12 MB
	Train Data (Original)  Memory Usage: 1.05 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('float', []) : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])   : 9 | ['day', 'holiday', 'hour', 'humidity', 'month', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])     : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])       : 6 | ['day', 'hour', 'humidity', 'month', 'season', ...]
		('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
	0.2s = Fit runtime
	12 features in original data used to generate 12 features in processed data.
	Train Data (Processed) Memory Usage: 0.82 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.29s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.7s of the 599.69s of remaining time.
	-123.781	 = Validation score   (-root_mean_squared_error)
	0.07s	 = Training   runtime
	0.3s	 = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.07s of the 599.07s of remaining time.
	-119.1941	 = Validation score   (-root_mean_squared_error)
	0.03s	 = Training   runtime
	0.2s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 398.61s of the 598.6s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
2022-10-23 20:27:05,406	WARNING services.py:2013 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 416284672 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=0.96gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
	-37.5315	 = Validation score   (-root_mean_squared_error)
	102.5s	 = Training   runtime
	20.97s	 = Validation runtime
Fitting model: LightGBM_BAG_L1 ... Training model for up to 288.6s of the 488.59s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-37.8753	 = Validation score   (-root_mean_squared_error)
	39.71s	 = Training   runtime
	3.34s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 245.08s of the 445.08s of remaining time.
	-42.1538	 = Validation score   (-root_mean_squared_error)
	9.96s	 = Training   runtime
	0.57s	 = Validation runtime
Fitting model: CatBoost_BAG_L1 ... Training model for up to 231.84s of the 431.83s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-37.551	 = Validation score   (-root_mean_squared_error)
	196.23s	 = Training   runtime
	0.15s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 32.44s of the 232.44s of remaining time.
	-41.5225	 = Validation score   (-root_mean_squared_error)
	5.22s	 = Training   runtime
	0.7s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 23.79s of the 223.78s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-103.9499	 = Validation score   (-root_mean_squared_error)
	45.05s	 = Training   runtime
	0.49s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 175.38s of remaining time.
	-36.0226	 = Validation score   (-root_mean_squared_error)
	1.01s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 174.26s of the 174.23s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-36.9115	 = Validation score   (-root_mean_squared_error)
	23.33s	 = Training   runtime
	0.27s	 = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 147.36s of the 147.33s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-36.5501	 = Validation score   (-root_mean_squared_error)
	22.53s	 = Training   runtime
	0.11s	 = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 121.22s of the 121.19s of remaining time.
	-37.1253	 = Validation score   (-root_mean_squared_error)
	28.43s	 = Training   runtime
	0.88s	 = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 89.22s of the 89.2s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-36.2752	 = Validation score   (-root_mean_squared_error)
	30.21s	 = Training   runtime
	0.06s	 = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 55.91s of the 55.88s of remaining time.
	-36.3866	 = Validation score   (-root_mean_squared_error)
	8.45s	 = Training   runtime
	0.68s	 = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 43.93s of the 43.91s of remaining time.
	Fitting 8 child models (S1F1 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-36.7555	 = Validation score   (-root_mean_squared_error)
	59.6s	 = Training   runtime
	0.48s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -19.01s of remaining time.
	-35.9994	 = Validation score   (-root_mean_squared_error)
	0.43s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 619.68s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221023_202702/")
In [12]:
predictor_wo_datetime.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                     model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0      WeightedEnsemble_L3  -35.999372      28.068213  520.002633                0.001175           0.430852            3       True         16
1      WeightedEnsemble_L2  -36.022646      25.732827  354.641560                0.001569           1.012878            2       True          9
2          CatBoost_BAG_L2  -36.275163      26.794154  428.989806                0.060683          30.213794            2       True         13
3     ExtraTreesMSE_BAG_L2  -36.386579      27.413426  407.229927                0.679956           8.453914            2       True         14
4          LightGBM_BAG_L2  -36.550106      26.843183  421.308851                0.109712          22.532838            2       True         11
5   NeuralNetFastAI_BAG_L2  -36.755484      27.216686  458.371235                0.483216          59.595222            2       True         15
6        LightGBMXT_BAG_L2  -36.911479      27.003105  422.109320                0.269634          23.333308            2       True         10
7   RandomForestMSE_BAG_L2  -37.125313      27.612287  427.202532                0.878816          28.426519            2       True         12
8        LightGBMXT_BAG_L1  -37.531467      20.966673  102.497085               20.966673         102.497085            1       True          3
9          CatBoost_BAG_L1  -37.551041       0.148058  196.234118                0.148058         196.234118            1       True          6
10         LightGBM_BAG_L1  -37.875308       3.339185   39.712124                3.339185          39.712124            1       True          4
11    ExtraTreesMSE_BAG_L1  -41.522503       0.703364    5.222335                0.703364           5.222335            1       True          7
12  RandomForestMSE_BAG_L1  -42.153772       0.573977    9.963019                0.573977           9.963019            1       True          5
13  NeuralNetFastAI_BAG_L1 -103.949897       0.494219   45.049742                0.494219          45.049742            1       True          8
14   KNeighborsDist_BAG_L1 -119.194060       0.204184    0.029573                0.204184           0.029573            1       True          2
15   KNeighborsUnif_BAG_L1 -123.781003       0.303809    0.068016                0.303809           0.068016            1       True          1
Number of models trained: 16
Types of models trained:
{'StackerEnsembleModel_KNN', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_XT', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_RF', 'WeightedEnsembleModel', 'StackerEnsembleModel_LGB'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', [])     : 3 | ['atemp', 'temp', 'windspeed']
('int', [])       : 6 | ['day', 'hour', 'humidity', 'month', 'season', ...]
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
Plot summary of models saved to file: AutogluonModels/ag-20221023_202702/SummaryOfModels.html
*** End of fit() summary ***
Out[12]:
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
  'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
  'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
  'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
  'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
  'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
  'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'KNeighborsUnif_BAG_L1': -123.78100255079445,
  'KNeighborsDist_BAG_L1': -119.19406017177057,
  'LightGBMXT_BAG_L1': -37.531466866876414,
  'LightGBM_BAG_L1': -37.87530799234931,
  'RandomForestMSE_BAG_L1': -42.15377231942091,
  'CatBoost_BAG_L1': -37.5510406746713,
  'ExtraTreesMSE_BAG_L1': -41.52250272210862,
  'NeuralNetFastAI_BAG_L1': -103.94989656652723,
  'WeightedEnsemble_L2': -36.022645556361475,
  'LightGBMXT_BAG_L2': -36.91147853077231,
  'LightGBM_BAG_L2': -36.55010647506041,
  'RandomForestMSE_BAG_L2': -37.12531330226235,
  'CatBoost_BAG_L2': -36.27516272432256,
  'ExtraTreesMSE_BAG_L2': -36.3865787954654,
  'NeuralNetFastAI_BAG_L2': -36.75548401910065,
  'WeightedEnsemble_L3': -35.99937150190432},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/KNeighborsUnif_BAG_L1/',
  'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/KNeighborsDist_BAG_L1/',
  'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/LightGBMXT_BAG_L1/',
  'LightGBM_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/LightGBM_BAG_L1/',
  'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/RandomForestMSE_BAG_L1/',
  'CatBoost_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/CatBoost_BAG_L1/',
  'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/ExtraTreesMSE_BAG_L1/',
  'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20221023_202702/models/NeuralNetFastAI_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20221023_202702/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20221023_202702/models/LightGBMXT_BAG_L2/',
  'LightGBM_BAG_L2': 'AutogluonModels/ag-20221023_202702/models/LightGBM_BAG_L2/',
  'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20221023_202702/models/RandomForestMSE_BAG_L2/',
  'CatBoost_BAG_L2': 'AutogluonModels/ag-20221023_202702/models/CatBoost_BAG_L2/',
  'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20221023_202702/models/ExtraTreesMSE_BAG_L2/',
  'NeuralNetFastAI_BAG_L2': 'AutogluonModels/ag-20221023_202702/models/NeuralNetFastAI_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20221023_202702/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.06801605224609375,
  'KNeighborsDist_BAG_L1': 0.02957296371459961,
  'LightGBMXT_BAG_L1': 102.4970850944519,
  'LightGBM_BAG_L1': 39.71212434768677,
  'RandomForestMSE_BAG_L1': 9.963019371032715,
  'CatBoost_BAG_L1': 196.2341182231903,
  'ExtraTreesMSE_BAG_L1': 5.222334861755371,
  'NeuralNetFastAI_BAG_L1': 45.04974174499512,
  'WeightedEnsemble_L2': 1.012878179550171,
  'LightGBMXT_BAG_L2': 23.33330774307251,
  'LightGBM_BAG_L2': 22.532838106155396,
  'RandomForestMSE_BAG_L2': 28.42651891708374,
  'CatBoost_BAG_L2': 30.213793754577637,
  'ExtraTreesMSE_BAG_L2': 8.453914165496826,
  'NeuralNetFastAI_BAG_L2': 59.59522247314453,
  'WeightedEnsemble_L3': 0.43085169792175293},
 'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.30380916595458984,
  'KNeighborsDist_BAG_L1': 0.20418429374694824,
  'LightGBMXT_BAG_L1': 20.966673374176025,
  'LightGBM_BAG_L1': 3.3391854763031006,
  'RandomForestMSE_BAG_L1': 0.5739772319793701,
  'CatBoost_BAG_L1': 0.14805817604064941,
  'ExtraTreesMSE_BAG_L1': 0.7033638954162598,
  'NeuralNetFastAI_BAG_L1': 0.4942190647125244,
  'WeightedEnsemble_L2': 0.0015690326690673828,
  'LightGBMXT_BAG_L2': 0.2696342468261719,
  'LightGBM_BAG_L2': 0.10971236228942871,
  'RandomForestMSE_BAG_L2': 0.8788161277770996,
  'CatBoost_BAG_L2': 0.06068301200866699,
  'ExtraTreesMSE_BAG_L2': 0.6799557209014893,
  'NeuralNetFastAI_BAG_L2': 0.4832158088684082,
  'WeightedEnsemble_L3': 0.0011749267578125},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'KNeighborsDist_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'LightGBMXT_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'RandomForestMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'CatBoost_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True,
   'use_child_oof': True},
  'NeuralNetFastAI_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                      model   score_val  pred_time_val    fit_time  \
 0      WeightedEnsemble_L3  -35.999372      28.068213  520.002633   
 1      WeightedEnsemble_L2  -36.022646      25.732827  354.641560   
 2          CatBoost_BAG_L2  -36.275163      26.794154  428.989806   
 3     ExtraTreesMSE_BAG_L2  -36.386579      27.413426  407.229927   
 4          LightGBM_BAG_L2  -36.550106      26.843183  421.308851   
 5   NeuralNetFastAI_BAG_L2  -36.755484      27.216686  458.371235   
 6        LightGBMXT_BAG_L2  -36.911479      27.003105  422.109320   
 7   RandomForestMSE_BAG_L2  -37.125313      27.612287  427.202532   
 8        LightGBMXT_BAG_L1  -37.531467      20.966673  102.497085   
 9          CatBoost_BAG_L1  -37.551041       0.148058  196.234118   
 10         LightGBM_BAG_L1  -37.875308       3.339185   39.712124   
 11    ExtraTreesMSE_BAG_L1  -41.522503       0.703364    5.222335   
 12  RandomForestMSE_BAG_L1  -42.153772       0.573977    9.963019   
 13  NeuralNetFastAI_BAG_L1 -103.949897       0.494219   45.049742   
 14   KNeighborsDist_BAG_L1 -119.194060       0.204184    0.029573   
 15   KNeighborsUnif_BAG_L1 -123.781003       0.303809    0.068016   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.001175           0.430852            3       True   
 1                 0.001569           1.012878            2       True   
 2                 0.060683          30.213794            2       True   
 3                 0.679956           8.453914            2       True   
 4                 0.109712          22.532838            2       True   
 5                 0.483216          59.595222            2       True   
 6                 0.269634          23.333308            2       True   
 7                 0.878816          28.426519            2       True   
 8                20.966673         102.497085            1       True   
 9                 0.148058         196.234118            1       True   
 10                3.339185          39.712124            1       True   
 11                0.703364           5.222335            1       True   
 12                0.573977           9.963019            1       True   
 13                0.494219          45.049742            1       True   
 14                0.204184           0.029573            1       True   
 15                0.303809           0.068016            1       True   
 
     fit_order  
 0          16  
 1           9  
 2          13  
 3          14  
 4          11  
 5          15  
 6          10  
 7          12  
 8           3  
 9           6  
 10          4  
 11          7  
 12          5  
 13          8  
 14          2  
 15          1  }
In [16]:
test_new["count"] = 0
performance = predictor_wo_datetime.evaluate(test_new.loc[:, test_new.columns.difference(["datetime"])])
print("The performance indicators are : \n", performance)
/usr/local/lib/python3.7/site-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -256.9979977163498
	Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -256.9979977163498,
    "mean_squared_error": -66047.97083021293,
    "mean_absolute_error": -189.7036975675971,
    "r2": 0.0,
    "pearsonr": NaN,
    "median_absolute_error": -144.92901611328125
}
The performance indicators are : 
 {'root_mean_squared_error': -256.9979977163498, 'mean_squared_error': -66047.97083021293, 'mean_absolute_error': -189.7036975675971, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -144.92901611328125}
In [17]:
# Remember to set all negative values to zero
predictions_wo_datetime = predictor_wo_datetime.predict(test_new.loc[:, test_new.columns.difference(["datetime"])])
In [18]:
print('Negative predictions are :', predictions_wo_datetime[predictions_wo_datetime<0])
Negative predictions are : Series([], Name: count, dtype: float32)
In [22]:
# Same submitting predictions
submission_wo_datetime = submission
submission_wo_datetime["count"] = predictions_wo_datetime
submission_wo_datetime.to_csv("submission_new_features_no_datetime.csv", index=False)
In [23]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_features_no_datetime.csv -m "new features + without datetime + set weather, holiday, season, workingday as categorical data "
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 302kB/s]
Successfully submitted to Bike Sharing Demand
In [24]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                                 date                 description                                                                                      status    publicScore  privateScore  
---------------------------------------  -------------------  -----------------------------------------------------------------------------------------------  --------  -----------  ------------  
submission_new_features_no_datetime.csv  2022-10-23 20:47:25  new features + without datetime + set weather, holiday, season, workingday as categorical data   complete  0.47409      0.47409       
submission_new_hpo.csv                   2022-10-23 02:55:42  new features with hyperparameter tuning of GBM and XGBoost                                       complete  0.47866      0.47866       
submission_new_hpo.csv                   2022-10-23 02:24:32  new features with hyperparameter tuning of GBM and XGBoost                                       complete  0.47866      0.47866       
submission_new_hpo.csv                   2022-10-19 22:30:48  new features with hyperparameters                                                                complete  0.48898      0.48898       

New Score of 1.80119¶

Step 6: Hyper parameter optimization¶

  • There are many options for hyper parameter optimization.
  • Options are to change the AutoGluon higher level parameters or the individual model hyperparameters.
  • The hyperparameters of the models themselves that are in AutoGluon. Those need the hyperparameter and hyperparameter_tune_kwargs arguments.

First hpo attempt¶

In this first attempt to tune the hyperparameters, I will change two factors :

  • setting the hyperparameters to default
  • setting the hyperparameter_tune_kwargs to automatic mode
In [28]:
hyperparameters = 'default'
hyperparameter_tune_kwargs = 'auto'

predictor_new_hpo = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(
    train_data=train_new.loc[:, train_new.columns.difference(["casual","registered"])], time_limit=600, presets="best_quality",
    hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)
No path specified. Models will be saved in: "AutogluonModels/ag-20221023_211333/"
Presets specified: ['best_quality']
Warning: hyperparameter tuning is currently experimental and may cause the process to hang.
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221023_211333/"
AutoGluon Version:  0.5.2
Python Version:     3.7.10
Operating System:   Linux
Train Data Rows:    10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    2346.99 MB
	Train Data (Original)  Memory Usage: 1.13 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting DatetimeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('datetime', []) : 1 | ['datetime']
		('float', [])    : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])      : 9 | ['day', 'holiday', 'hour', 'humidity', 'month', ...]
	Types of features in processed data (raw dtype, special dtypes):
		('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])                  : 6 | ['day', 'hour', 'humidity', 'month', 'season', ...]
		('int', ['bool'])            : 3 | ['holiday', 'workingday', 'year']
		('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
	0.1s = Fit runtime
	13 features in original data used to generate 17 features in processed data.
	Train Data (Processed) Memory Usage: 1.25 MB (0.1% of available memory)
Data preprocessing and feature engineering runtime = 0.18s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Hyperparameter tuning model: KNeighborsUnif_BAG_L1 ... Tuning model for up to 4.09s of the 599.81s of remaining time.
	No hyperparameter search space specified for KNeighborsUnif. Skipping HPO. Will train one model based on the provided hyperparameters.
Warning: Exception caused KNeighborsUnif_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 182, in _hyperparameter_tune
    return super()._hyperparameter_tune(X=X, y=y, k_fold=k_fold, hpo_executor=hpo_executor, preprocess_kwargs=preprocess_kwargs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 1080, in _hyperparameter_tune
    model_path = model_info['path']
TypeError: string indices must be integers
string indices must be integers
Hyperparameter tuning model: KNeighborsDist_BAG_L1 ... Tuning model for up to 4.09s of the 599.53s of remaining time.
	No hyperparameter search space specified for KNeighborsDist. Skipping HPO. Will train one model based on the provided hyperparameters.
Warning: Exception caused KNeighborsDist_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 182, in _hyperparameter_tune
    return super()._hyperparameter_tune(X=X, y=y, k_fold=k_fold, hpo_executor=hpo_executor, preprocess_kwargs=preprocess_kwargs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 1080, in _hyperparameter_tune
    model_path = model_info['path']
TypeError: string indices must be integers
string indices must be integers
Hyperparameter tuning model: LightGBMXT_BAG_L1 ... Tuning model for up to 4.09s of the 599.27s of remaining time.
[1000]	valid_set's rmse: 35.8966
[2000]	valid_set's rmse: 33.8791
	Ran out of time, early stopping on iteration 2741. Best iteration is:
	[2729]	valid_set's rmse: 33.4549
	Stopping HPO to satisfy time limit...
Fitted model: LightGBMXT_BAG_L1/T1 ...
	-33.4549	 = Validation score   (-root_mean_squared_error)
	3.75s	 = Training   runtime
	0.23s	 = Validation runtime
Hyperparameter tuning model: LightGBM_BAG_L1 ... Tuning model for up to 4.09s of the 593.6s of remaining time.
[1000]	valid_set's rmse: 33.0497
	Stopping HPO to satisfy time limit...
Fitted model: LightGBM_BAG_L1/T1 ...
	-32.9844	 = Validation score   (-root_mean_squared_error)
	1.65s	 = Training   runtime
	0.07s	 = Validation runtime
Hyperparameter tuning model: RandomForestMSE_BAG_L1 ... Tuning model for up to 4.09s of the 591.11s of remaining time.
	No hyperparameter search space specified for RandomForestMSE. Skipping HPO. Will train one model based on the provided hyperparameters.
Warning: Exception caused RandomForestMSE_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 182, in _hyperparameter_tune
    return super()._hyperparameter_tune(X=X, y=y, k_fold=k_fold, hpo_executor=hpo_executor, preprocess_kwargs=preprocess_kwargs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 1080, in _hyperparameter_tune
    model_path = model_info['path']
TypeError: string indices must be integers
string indices must be integers
Hyperparameter tuning model: CatBoost_BAG_L1 ... Tuning model for up to 4.09s of the 575.31s of remaining time.
	Ran out of time, early stopping on iteration 1026.
	Stopping HPO to satisfy time limit...
Fitted model: CatBoost_BAG_L1/T1 ...
	-34.0167	 = Validation score   (-root_mean_squared_error)
	3.2s	 = Training   runtime
	0.0s	 = Validation runtime
Hyperparameter tuning model: ExtraTreesMSE_BAG_L1 ... Tuning model for up to 4.09s of the 571.78s of remaining time.
	No hyperparameter search space specified for ExtraTreesMSE. Skipping HPO. Will train one model based on the provided hyperparameters.
Warning: Exception caused ExtraTreesMSE_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 182, in _hyperparameter_tune
    return super()._hyperparameter_tune(X=X, y=y, k_fold=k_fold, hpo_executor=hpo_executor, preprocess_kwargs=preprocess_kwargs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 1080, in _hyperparameter_tune
    model_path = model_info['path']
TypeError: string indices must be integers
string indices must be integers
Hyperparameter tuning model: NeuralNetFastAI_BAG_L1 ... Tuning model for up to 4.09s of the 563.19s of remaining time.
2022-10-23 21:14:13,326	WARNING services.py:2013 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 416284672 bytes available. This will harm performance! You may be able to free up space by deleting files in /dev/shm. If you are inside a Docker container, you can increase /dev/shm size by passing '--shm-size=0.75gb' to 'docker run' (or add it to the run_options list in a Ray cluster config). Make sure to set this to more than 30% of available RAM.
2022-10-23 21:14:16,738	ERROR syncer.py:147 -- Log sync requires rsync to be installed.
NaN or Inf found in input tensor.
2022-10-23 21:14:21,238	INFO stopper.py:364 -- Reached timeout of 3.270924513194561 seconds. Stopping all trials.
Hyperparameter tuning model: XGBoost_BAG_L1 ... Tuning model for up to 4.09s of the 552.04s of remaining time.
	Stopping HPO to satisfy time limit...
Fitted model: XGBoost_BAG_L1/T1 ...
	-33.5839	 = Validation score   (-root_mean_squared_error)
	3.77s	 = Training   runtime
	0.03s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L1 ... Tuning model for up to 4.09s of the 547.76s of remaining time.
NaN or Inf found in input tensor.
2022-10-23 21:14:31,574	INFO stopper.py:364 -- Reached timeout of 3.270924513194561 seconds. Stopping all trials.
Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 4.09s of the 541.71s of remaining time.
	Fitting 1 child models (S1F1 - S1F1) | Fitting with ParallelLocalFoldFittingStrategy
	-33.1828	 = Validation score   (-root_mean_squared_error)
	4.83s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L1/T1 ... Training model for up to 333.91s of the 533.94s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-34.5195	 = Validation score   (-root_mean_squared_error)
	86.97s	 = Training   runtime
	10.64s	 = Validation runtime
Fitting model: LightGBM_BAG_L1/T1 ... Training model for up to 246.4s of the 446.44s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.9992	 = Validation score   (-root_mean_squared_error)
	42.88s	 = Training   runtime
	3.05s	 = Validation runtime
Fitting model: CatBoost_BAG_L1/T1 ... Training model for up to 201.41s of the 401.45s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.5594	 = Validation score   (-root_mean_squared_error)
	173.38s	 = Training   runtime
	0.1s	 = Validation runtime
Fitting model: XGBoost_BAG_L1/T1 ... Training model for up to 28.04s of the 228.08s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-34.4226	 = Validation score   (-root_mean_squared_error)
	33.25s	 = Training   runtime
	0.58s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 195.44s of remaining time.
	-32.2607	 = Validation score   (-root_mean_squared_error)
	0.29s	 = Training   runtime
	0.0s	 = Validation runtime
Fitting 9 L2 models ...
Hyperparameter tuning model: LightGBMXT_BAG_L2 ... Tuning model for up to 2.44s of the 195.06s of remaining time.
	Stopping HPO to satisfy time limit...
Fitted model: LightGBMXT_BAG_L2/T1 ...
	-37.3993	 = Validation score   (-root_mean_squared_error)
	1.03s	 = Training   runtime
	0.02s	 = Validation runtime
Hyperparameter tuning model: LightGBM_BAG_L2 ... Tuning model for up to 2.44s of the 193.65s of remaining time.
	Ran out of time, early stopping on iteration 311. Best iteration is:
	[60]	valid_set's rmse: 36.8695
	Stopping HPO to satisfy time limit...
Fitted model: LightGBM_BAG_L2/T1 ...
	-36.684	 = Validation score   (-root_mean_squared_error)
	0.81s	 = Training   runtime
	0.01s	 = Validation runtime
Fitted model: LightGBM_BAG_L2/T2 ...
	-36.8695	 = Validation score   (-root_mean_squared_error)
	1.01s	 = Training   runtime
	0.01s	 = Validation runtime
Hyperparameter tuning model: RandomForestMSE_BAG_L2 ... Tuning model for up to 2.44s of the 191.31s of remaining time.
	No hyperparameter search space specified for RandomForestMSE. Skipping HPO. Will train one model based on the provided hyperparameters.
Warning: Exception caused RandomForestMSE_BAG_L2 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 182, in _hyperparameter_tune
    return super()._hyperparameter_tune(X=X, y=y, k_fold=k_fold, hpo_executor=hpo_executor, preprocess_kwargs=preprocess_kwargs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 1080, in _hyperparameter_tune
    model_path = model_info['path']
TypeError: string indices must be integers
string indices must be integers
Hyperparameter tuning model: CatBoost_BAG_L2 ... Tuning model for up to 2.44s of the 164.68s of remaining time.
	Ran out of time, early stopping on iteration 441.
	Stopping HPO to satisfy time limit...
Fitted model: CatBoost_BAG_L2/T1 ...
	-36.0055	 = Validation score   (-root_mean_squared_error)
	1.86s	 = Training   runtime
	0.0s	 = Validation runtime
Hyperparameter tuning model: ExtraTreesMSE_BAG_L2 ... Tuning model for up to 2.44s of the 162.52s of remaining time.
	No hyperparameter search space specified for ExtraTreesMSE. Skipping HPO. Will train one model based on the provided hyperparameters.
Warning: Exception caused ExtraTreesMSE_BAG_L2 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 1001, in hyperparameter_tune
    return self._hyperparameter_tune(hpo_executor=hpo_executor, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/stacker_ensemble_model.py", line 182, in _hyperparameter_tune
    return super()._hyperparameter_tune(X=X, y=y, k_fold=k_fold, hpo_executor=hpo_executor, preprocess_kwargs=preprocess_kwargs, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/ensemble/bagged_ensemble_model.py", line 1080, in _hyperparameter_tune
    model_path = model_info['path']
TypeError: string indices must be integers
string indices must be integers
Hyperparameter tuning model: NeuralNetFastAI_BAG_L2 ... Tuning model for up to 2.44s of the 151.54s of remaining time.
NaN or Inf found in input tensor.
2022-10-23 21:21:08,142	INFO stopper.py:364 -- Reached timeout of 1.950755183696747 seconds. Stopping all trials.
Hyperparameter tuning model: XGBoost_BAG_L2 ... Tuning model for up to 2.44s of the 145.0s of remaining time.
	Stopping HPO to satisfy time limit...
Fitted model: XGBoost_BAG_L2/T1 ...
	-37.0546	 = Validation score   (-root_mean_squared_error)
	1.51s	 = Training   runtime
	0.01s	 = Validation runtime
Hyperparameter tuning model: NeuralNetTorch_BAG_L2 ... Tuning model for up to 2.44s of the 143.11s of remaining time.
NaN or Inf found in input tensor.
2022-10-23 21:21:16,165	INFO stopper.py:364 -- Reached timeout of 1.950755183696747 seconds. Stopping all trials.
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 2.44s of the 136.94s of remaining time.
	Fitting 1 child models (S1F1 - S1F1) | Fitting with ParallelLocalFoldFittingStrategy
	-37.4264	 = Validation score   (-root_mean_squared_error)
	3.01s	 = Training   runtime
	0.01s	 = Validation runtime
Fitting model: LightGBMXT_BAG_L2/T1 ... Training model for up to 131.58s of the 131.57s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.4906	 = Validation score   (-root_mean_squared_error)
	20.31s	 = Training   runtime
	0.18s	 = Validation runtime
Fitting model: LightGBM_BAG_L2/T1 ... Training model for up to 109.73s of the 109.71s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-32.9263	 = Validation score   (-root_mean_squared_error)
	19.26s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: LightGBM_BAG_L2/T2 ... Training model for up to 88.18s of the 88.17s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.6334	 = Validation score   (-root_mean_squared_error)
	21.86s	 = Training   runtime
	0.08s	 = Validation runtime
Fitting model: CatBoost_BAG_L2/T1 ... Training model for up to 64.37s of the 64.35s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-32.5454	 = Validation score   (-root_mean_squared_error)
	24.35s	 = Training   runtime
	0.07s	 = Validation runtime
Fitting model: XGBoost_BAG_L2/T1 ... Training model for up to 38.65s of the 38.64s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.5402	 = Validation score   (-root_mean_squared_error)
	19.53s	 = Training   runtime
	0.12s	 = Validation runtime
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 17.54s of the 17.53s of remaining time.
	Fitting 7 child models (S1F2 - S1F8) | Fitting with ParallelLocalFoldFittingStrategy
	-33.8536	 = Validation score   (-root_mean_squared_error)
	30.66s	 = Training   runtime
	0.16s	 = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -13.28s of remaining time.
	-32.4574	 = Validation score   (-root_mean_squared_error)
	0.42s	 = Training   runtime
	0.0s	 = Validation runtime
AutoGluon training complete, total runtime = 613.92s ... Best model: "WeightedEnsemble_L2"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20221023_211333/")
In [27]:
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                   model  score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0    WeightedEnsemble_L3 -32.303080      13.229172  448.804626                0.000901           0.387506            3       True         14
1    WeightedEnsemble_L2 -32.316131      12.340177  337.509858                0.001140           0.295377            2       True          6
2     CatBoost_BAG_L2/T1 -32.518530      12.449266  372.622336                0.110229          35.407855            2       True         11
3     LightGBM_BAG_L2/T1 -32.610988      12.477141  356.246051                0.138104          19.031570            2       True          9
4      XGBoost_BAG_L2/T1 -32.942820      12.481319  351.933692                0.142282          14.719211            2       True         12
5   LightGBMXT_BAG_L2/T1 -33.346366      12.574636  357.023190                0.235599          19.808709            2       True          7
6     LightGBM_BAG_L2/T2 -33.349182      12.474386  360.440445                0.135350          23.225964            2       True         10
7   LightGBMLarge_BAG_L1 -33.687603       0.111603    4.693507                0.111603           4.693507            1       True          5
8     LightGBM_BAG_L1/T1 -33.915986       2.554125   39.672345                2.554125          39.672345            1       True          2
9   LightGBMXT_BAG_L2/T2 -34.017983      12.602056  359.449775                0.263019          22.235294            2       True          8
10  LightGBMXT_BAG_L1/T1 -34.337455       9.090987   84.448378                9.090987          84.448378            1       True          1
11     XGBoost_BAG_L1/T1 -34.552675       0.541804   32.540259                0.541804          32.540259            1       True          4
12    CatBoost_BAG_L1/T1 -34.707230       0.152121  180.553499                0.152121         180.553499            1       True          3
13  LightGBMLarge_BAG_L2 -36.011828      12.358922  340.162013                0.019885           2.947532            2       True         13
Number of models trained: 14
Types of models trained:
{'StackerEnsembleModel_XGBoost', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_LGB', 'WeightedEnsembleModel'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])             : 2 | ['season', 'weather']
('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
('int', [])                  : 4 | ['day', 'hour', 'humidity', 'month']
('int', ['bool'])            : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221019_223247/SummaryOfModels.html
*** End of fit() summary ***
Out[27]:
{'model_types': {'LightGBMXT_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'CatBoost_BAG_L1/T1': 'StackerEnsembleModel_CatBoost',
  'XGBoost_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
  'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBMXT_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBMXT_BAG_L2/T2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB',
  'CatBoost_BAG_L2/T1': 'StackerEnsembleModel_CatBoost',
  'XGBoost_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
  'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'LightGBMXT_BAG_L1/T1': -34.33745545082705,
  'LightGBM_BAG_L1/T1': -33.91598594261488,
  'CatBoost_BAG_L1/T1': -34.70722968272442,
  'XGBoost_BAG_L1/T1': -34.552674851674695,
  'LightGBMLarge_BAG_L1': -33.687602754409426,
  'WeightedEnsemble_L2': -32.31613067415643,
  'LightGBMXT_BAG_L2/T1': -33.34636635930924,
  'LightGBMXT_BAG_L2/T2': -34.01798318498792,
  'LightGBM_BAG_L2/T1': -32.61098830441786,
  'LightGBM_BAG_L2/T2': -33.34918196644897,
  'CatBoost_BAG_L2/T1': -32.51853023440451,
  'XGBoost_BAG_L2/T1': -32.9428199805874,
  'LightGBMLarge_BAG_L2': -36.01182792599411,
  'WeightedEnsemble_L3': -32.303080325729134},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'LightGBMXT_BAG_L1/T1': 'AutogluonModels/ag-20221019_223247/models/LightGBMXT_BAG_L1/T1/',
  'LightGBM_BAG_L1/T1': 'AutogluonModels/ag-20221019_223247/models/LightGBM_BAG_L1/T1/',
  'CatBoost_BAG_L1/T1': 'AutogluonModels/ag-20221019_223247/models/CatBoost_BAG_L1/T1/',
  'XGBoost_BAG_L1/T1': 'AutogluonModels/ag-20221019_223247/models/XGBoost_BAG_L1/T1/',
  'LightGBMLarge_BAG_L1': 'AutogluonModels/ag-20221019_223247/models/LightGBMLarge_BAG_L1/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20221019_223247/models/WeightedEnsemble_L2/',
  'LightGBMXT_BAG_L2/T1': 'AutogluonModels/ag-20221019_223247/models/LightGBMXT_BAG_L2/T1/',
  'LightGBMXT_BAG_L2/T2': 'AutogluonModels/ag-20221019_223247/models/LightGBMXT_BAG_L2/T2/',
  'LightGBM_BAG_L2/T1': 'AutogluonModels/ag-20221019_223247/models/LightGBM_BAG_L2/T1/',
  'LightGBM_BAG_L2/T2': 'AutogluonModels/ag-20221019_223247/models/LightGBM_BAG_L2/T2/',
  'CatBoost_BAG_L2/T1': 'AutogluonModels/ag-20221019_223247/models/CatBoost_BAG_L2/T1/',
  'XGBoost_BAG_L2/T1': 'AutogluonModels/ag-20221019_223247/models/XGBoost_BAG_L2/T1/',
  'LightGBMLarge_BAG_L2': 'AutogluonModels/ag-20221019_223247/models/LightGBMLarge_BAG_L2/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20221019_223247/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'LightGBMXT_BAG_L1/T1': 84.44837832450867,
  'LightGBM_BAG_L1/T1': 39.67234492301941,
  'CatBoost_BAG_L1/T1': 180.55349850654602,
  'XGBoost_BAG_L1/T1': 32.54025936126709,
  'LightGBMLarge_BAG_L1': 4.693507432937622,
  'WeightedEnsemble_L2': 0.2953770160675049,
  'LightGBMXT_BAG_L2/T1': 19.808709144592285,
  'LightGBMXT_BAG_L2/T2': 22.235293865203857,
  'LightGBM_BAG_L2/T1': 19.031569957733154,
  'LightGBM_BAG_L2/T2': 23.225964069366455,
  'CatBoost_BAG_L2/T1': 35.40785527229309,
  'XGBoost_BAG_L2/T1': 14.719210863113403,
  'LightGBMLarge_BAG_L2': 2.9475321769714355,
  'WeightedEnsemble_L3': 0.38750624656677246},
 'model_pred_times': {'LightGBMXT_BAG_L1/T1': 9.090986967086792,
  'LightGBM_BAG_L1/T1': 2.554124593734741,
  'CatBoost_BAG_L1/T1': 0.15212106704711914,
  'XGBoost_BAG_L1/T1': 0.5418040752410889,
  'LightGBMLarge_BAG_L1': 0.1116032600402832,
  'WeightedEnsemble_L2': 0.0011401176452636719,
  'LightGBMXT_BAG_L2/T1': 0.23559927940368652,
  'LightGBMXT_BAG_L2/T2': 0.2630190849304199,
  'LightGBM_BAG_L2/T1': 0.13810420036315918,
  'LightGBM_BAG_L2/T2': 0.13534951210021973,
  'CatBoost_BAG_L2/T1': 0.1102294921875,
  'XGBoost_BAG_L2/T1': 0.14228224754333496,
  'LightGBMLarge_BAG_L2': 0.01988506317138672,
  'WeightedEnsemble_L3': 0.0009007453918457031},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'LightGBMXT_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'CatBoost_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMXT_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'CatBoost_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBMLarge_BAG_L2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                    model  score_val  pred_time_val    fit_time  \
 0    WeightedEnsemble_L3 -32.303080      13.229172  448.804626   
 1    WeightedEnsemble_L2 -32.316131      12.340177  337.509858   
 2     CatBoost_BAG_L2/T1 -32.518530      12.449266  372.622336   
 3     LightGBM_BAG_L2/T1 -32.610988      12.477141  356.246051   
 4      XGBoost_BAG_L2/T1 -32.942820      12.481319  351.933692   
 5   LightGBMXT_BAG_L2/T1 -33.346366      12.574636  357.023190   
 6     LightGBM_BAG_L2/T2 -33.349182      12.474386  360.440445   
 7   LightGBMLarge_BAG_L1 -33.687603       0.111603    4.693507   
 8     LightGBM_BAG_L1/T1 -33.915986       2.554125   39.672345   
 9   LightGBMXT_BAG_L2/T2 -34.017983      12.602056  359.449775   
 10  LightGBMXT_BAG_L1/T1 -34.337455       9.090987   84.448378   
 11     XGBoost_BAG_L1/T1 -34.552675       0.541804   32.540259   
 12    CatBoost_BAG_L1/T1 -34.707230       0.152121  180.553499   
 13  LightGBMLarge_BAG_L2 -36.011828      12.358922  340.162013   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.000901           0.387506            3       True   
 1                 0.001140           0.295377            2       True   
 2                 0.110229          35.407855            2       True   
 3                 0.138104          19.031570            2       True   
 4                 0.142282          14.719211            2       True   
 5                 0.235599          19.808709            2       True   
 6                 0.135350          23.225964            2       True   
 7                 0.111603           4.693507            1       True   
 8                 2.554125          39.672345            1       True   
 9                 0.263019          22.235294            2       True   
 10                9.090987          84.448378            1       True   
 11                0.541804          32.540259            1       True   
 12                0.152121         180.553499            1       True   
 13                0.019885           2.947532            2       True   
 
     fit_order  
 0          14  
 1           6  
 2          11  
 3           9  
 4          12  
 5           7  
 6          10  
 7           5  
 8           2  
 9           8  
 10          1  
 11          4  
 12          3  
 13         13  }

Let's plot scores of the top performers of the tested models.

In [12]:
predictor_new_hpo.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
Out[12]:
<AxesSubplot:xlabel='model'>
In [28]:
# Remember to set all negative values to zero
test_new["count"] = 0
performance_new_hpo = predictor_new_hpo.evaluate(test_new)
print("The performance indicators are : \n", performance_new_hpo)
/usr/local/lib/python3.7/site-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -256.7605465791283
	Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -256.7605465791283,
    "mean_squared_error": -65925.97827961271,
    "mean_absolute_error": -190.33855804882228,
    "r2": 0.0,
    "pearsonr": NaN,
    "median_absolute_error": -149.23776245117188
}
The performance indicators are : 
 {'root_mean_squared_error': -256.7605465791283, 'mean_squared_error': -65925.97827961271, 'mean_absolute_error': -190.33855804882228, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -149.23776245117188}
In [29]:
# Remember to set all negative values to zero
predictions_new_features = predictor_new_hpo.predict(test_new)
In [18]:
predictions_new_features
Out[18]:
0        12.811199
1         7.545908
2         7.033610
3         6.769294
4         6.894789
           ...    
6488    327.683197
6489    218.574661
6490    148.141708
6491     97.523758
6492     48.462696
Name: count, Length: 6493, dtype: float32
In [19]:
predictions_new_features[predictions_new_features<0]
Out[19]:
Series([], Name: count, dtype: float32)
In [23]:
# Same submitting predictions
submission_new_hpo = submission
submission_new_hpo["count"] = predictions_new_features
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
In [24]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 349kB/s]
Successfully submitted to Bike Sharing Demand
In [25]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                                               status    publicScore  privateScore  
---------------------------  -------------------  --------------------------------------------------------  --------  -----------  ------------  
submission_new_hpo.csv       2022-10-19 22:30:48  new features with hyperparameters                         complete  0.48898      0.48898       
submission_new_features.csv  2022-10-19 20:23:34  new features + set weather, holiday, season, workingday   complete  1.80119      1.80119       
submission_new_features.csv  2022-10-19 19:25:25  new features + set weather, holiday, season, workingday   complete  1.80895      1.80895       
submission.csv               2022-10-19 18:30:41  3nd raw submission                                        complete  1.80895      1.80895       

New Score of 0.48898¶

Second hpo attempt¶

I will hereafter tune the hyperparameters of some of the models tested by autogluon. I will specifically tune models that were among the top 10 performers in the first hyperparameter tuning attempt. Therefore, in this section, I will test different parameters for LightGBM and XGBoost. Since CATboost usally performs well with default parameters, I will not tune it.

In [31]:
gbm_config = [{'num_boost_round': 100},  # number of boosting rounds (controls training time of GBM models)
              #'num_leaves': ag.space.Int(lower=10, upper=50),  # number of leaves in trees (integer hyperparameter)
              {'num_leaves': 70},
              {'num_leaves': 100},
              {'num_leaves':150}]



xgb_config = [{'eta ':0.1}, 
              {'eta':0.2}, 
              {'n_estimators':50},
              {'n_estimators':100}, 
              {'n_estimators':150}]

#dict = {‘RF’: [{‘criterion’: ‘gini’}, {‘criterion’: ‘entropy’}]}

hyperparameters = hyperparameters = {
                                       'GBM': gbm_config,
                                       #'CAT':cat_config,
                                       'XGB': xgb_config
                                      }
#hyperparameter_tune_kwargs = 'auto'
hyperparameter_tune_kwargs = {'searcher': 'auto'}

predictor_new_hpo = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(
    train_data=train_new.loc[:, train_new.columns.difference(["casual","registered"])], time_limit=600, presets="best_quality",
    hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)
No path specified. Models will be saved in: "AutogluonModels/ag-20221023_025044/"
Presets specified: ['best_quality']
Warning: hyperparameter tuning is currently experimental and may cause the process to hang.
Stack configuration (auto_stack=True): num_stack_levels=1, num_bag_folds=8, num_bag_sets=20
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20221023_025044/"
AutoGluon Version:  0.5.2
Python Version:     3.7.10
Operating System:   Linux
Train Data Rows:    10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
	Available Memory:                    2458.08 MB
	Train Data (Original)  Memory Usage: 0.83 MB (0.0% of available memory)
	Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
	Stage 1 Generators:
		Fitting AsTypeFeatureGenerator...
			Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
	Stage 2 Generators:
		Fitting FillNaFeatureGenerator...
	Stage 3 Generators:
		Fitting IdentityFeatureGenerator...
		Fitting CategoryFeatureGenerator...
			Fitting CategoryMemoryMinimizeFeatureGenerator...
		Fitting DatetimeFeatureGenerator...
	Stage 4 Generators:
		Fitting DropUniqueFeatureGenerator...
	Types of features in original data (raw dtype, special dtypes):
		('category', []) : 4 | ['holiday', 'season', 'weather', 'workingday']
		('datetime', []) : 1 | ['datetime']
		('float', [])    : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])      : 5 | ['day', 'hour', 'humidity', 'month', 'year']
	Types of features in processed data (raw dtype, special dtypes):
		('category', [])             : 2 | ['season', 'weather']
		('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
		('int', [])                  : 4 | ['day', 'hour', 'humidity', 'month']
		('int', ['bool'])            : 3 | ['holiday', 'workingday', 'year']
		('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
	0.1s = Fit runtime
	13 features in original data used to generate 17 features in processed data.
	Train Data (Processed) Memory Usage: 1.1 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.19s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
	This metric's sign has been flipped to adhere to being higher_is_better. The metric score can be multiplied by -1 to get the metric value.
	To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 9 L1 models ...
Hyperparameter tuning model: LightGBM_BAG_L1 ... Tuning model for up to 5.0s of the 599.81s of remaining time.
Warning: Exception caused LightGBM_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: LightGBM_2_BAG_L1 ... Tuning model for up to 5.0s of the 599.79s of remaining time.
Warning: Exception caused LightGBM_2_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: LightGBM_3_BAG_L1 ... Tuning model for up to 5.0s of the 599.77s of remaining time.
Warning: Exception caused LightGBM_3_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: LightGBM_4_BAG_L1 ... Tuning model for up to 5.0s of the 599.75s of remaining time.
Warning: Exception caused LightGBM_4_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: XGBoost_BAG_L1 ... Tuning model for up to 5.0s of the 599.73s of remaining time.
Warning: Exception caused XGBoost_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: XGBoost_2_BAG_L1 ... Tuning model for up to 5.0s of the 599.71s of remaining time.
Warning: Exception caused XGBoost_2_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: XGBoost_3_BAG_L1 ... Tuning model for up to 5.0s of the 599.69s of remaining time.
Warning: Exception caused XGBoost_3_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: XGBoost_4_BAG_L1 ... Tuning model for up to 5.0s of the 599.68s of remaining time.
Warning: Exception caused XGBoost_4_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Hyperparameter tuning model: XGBoost_5_BAG_L1 ... Tuning model for up to 5.0s of the 599.66s of remaining time.
Warning: Exception caused XGBoost_5_BAG_L1 to fail during hyperparameter tuning... Skipping this model.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py", line 1385, in _train_single_full
    **model_fit_kwargs
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/models/abstract/abstract_model.py", line 996, in hyperparameter_tune
    hpo_executor.initialize(hyperparameter_tune_kwargs, default_num_trials=default_num_trials, time_limit=time_limit)
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/hpo/executors.py", line 318, in initialize
    hyperparameter_tune_kwargs = scheduler_factory(hyperparameter_tune_kwargs, num_trials=num_trials, nthreads_per_trial='auto', ngpus_per_trial='auto')
  File "/usr/local/lib/python3.7/site-packages/autogluon/core/scheduler/scheduler_factory.py", line 76, in scheduler_factory
    raise ValueError(f"Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {hyperparameter_tune_kwargs}")
ValueError: Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Required key 'scheduler' is not present in hyperparameter_tune_kwargs: {'searcher': 'auto'}
Completed 1/20 k-fold bagging repeats ...
No base models to train on, skipping auxiliary stack level 2...
No base models to train on, skipping stack level 2...
No base models to train on, skipping auxiliary stack level 3...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-c3b3a4e44df2> in <module>
     25 predictor_new_hpo = TabularPredictor(label="count", problem_type="regression", eval_metric="root_mean_squared_error").fit(
     26     train_data=train_new.loc[:, train_new.columns.difference(["casual","registered"])], time_limit=600, presets="best_quality",
---> 27     hyperparameters=hyperparameters, hyperparameter_tune_kwargs=hyperparameter_tune_kwargs)

/usr/local/lib/python3.7/site-packages/autogluon/core/utils/decorators.py in _call(*args, **kwargs)
     28         def _call(*args, **kwargs):
     29             gargs, gkwargs = g(*other_args, *args, **kwargs)
---> 30             return f(*gargs, **gkwargs)
     31         return _call
     32     return _unpack_inner

/usr/local/lib/python3.7/site-packages/autogluon/tabular/predictor/predictor.py in fit(self, train_data, tuning_data, time_limit, presets, hyperparameters, feature_metadata, infer_limit, infer_limit_batch_size, **kwargs)
    834                           hyperparameters=hyperparameters, core_kwargs=core_kwargs,
    835                           time_limit=time_limit, infer_limit=infer_limit, infer_limit_batch_size=infer_limit_batch_size,
--> 836                           verbosity=verbosity, use_bag_holdout=use_bag_holdout)
    837         self._set_post_fit_vars()
    838 

/usr/local/lib/python3.7/site-packages/autogluon/tabular/learner/abstract_learner.py in fit(self, X, X_val, **kwargs)
    116             raise AssertionError('Learner is already fit.')
    117         self._validate_fit_input(X=X, X_val=X_val, **kwargs)
--> 118         return self._fit(X=X, X_val=X_val, **kwargs)
    119 
    120     def _fit(self, X: DataFrame, X_val: DataFrame = None, scheduler_options=None, hyperparameter_tune=False,

/usr/local/lib/python3.7/site-packages/autogluon/tabular/learner/default_learner.py in _fit(self, X, X_val, X_unlabeled, holdout_frac, num_bag_folds, num_bag_sets, time_limit, infer_limit, infer_limit_batch_size, verbosity, **trainer_fit_kwargs)
    135             infer_limit_batch_size=infer_limit_batch_size,
    136             groups=groups,
--> 137             **trainer_fit_kwargs
    138         )
    139         self.save_trainer(trainer=trainer)

/usr/local/lib/python3.7/site-packages/autogluon/tabular/trainer/auto_trainer.py in fit(self, X, y, hyperparameters, X_val, y_val, X_unlabeled, holdout_frac, num_stack_levels, core_kwargs, time_limit, infer_limit, infer_limit_batch_size, use_bag_holdout, groups, **kwargs)
     94                                        infer_limit=infer_limit,
     95                                        infer_limit_batch_size=infer_limit_batch_size,
---> 96                                        groups=groups)
     97 
     98     def construct_model_templates_distillation(self, hyperparameters, **kwargs):

/usr/local/lib/python3.7/site-packages/autogluon/core/trainer/abstract_trainer.py in _train_multi_and_ensemble(self, X, y, X_val, y_val, hyperparameters, X_unlabeled, num_stack_levels, time_limit, groups, **kwargs)
   1666                                                   X_unlabeled=X_unlabeled, level_start=1, level_end=num_stack_levels+1, time_limit=time_limit, **kwargs)
   1667         if len(self.get_model_names()) == 0:
-> 1668             raise ValueError('AutoGluon did not successfully train any models')
   1669         return model_names_fit
   1670 

ValueError: AutoGluon did not successfully train any models
In [32]:
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
                   model   score_val  pred_time_val    fit_time  pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  fit_order
0    WeightedEnsemble_L3  -32.616308       8.485239  370.376266                0.001056           0.431738            3       True         43
1    WeightedEnsemble_L2  -32.739922       6.628335  189.298007                0.000808           0.767611            2       True         30
2     LightGBM_BAG_L2/T1  -32.762137       8.316424  340.986121                0.134214          17.128103            2       True         31
3    XGBoost_2_BAG_L2/T1  -32.880989       8.349969  352.816425                0.167759          28.958407            2       True         36
4   LightGBM_2_BAG_L2/T1  -33.223683       8.356585  354.616779                0.174376          30.758761            2       True         33
5   LightGBM_3_BAG_L2/T1  -33.446330       8.339879  358.925689                0.157669          35.067671            2       True         34
6     LightGBM_BAG_L2/T2  -33.530842       8.301745  342.932317                0.119535          19.074299            2       True         32
7   LightGBM_4_BAG_L2/T1  -33.556592       8.404984  367.835747                0.222774          43.977730            2       True         35
8   LightGBM_2_BAG_L1/T1  -33.820572       2.035105   38.132162                2.035105          38.132162            1       True          7
9   LightGBM_3_BAG_L1/T1  -33.824605       1.926668   38.537019                1.926668          38.537019            1       True          8
10  LightGBM_4_BAG_L1/T1  -34.112016       1.739962   40.472843                1.739962          40.472843            1       True          9
11   XGBoost_2_BAG_L1/T1  -34.529856       0.567162   40.906013                0.567162          40.906013            1       True         10
12   XGBoost_5_BAG_L1/T4  -35.439059       0.016123    1.317936                0.016123           1.317936            1       True         29
13   XGBoost_4_BAG_L1/T4  -35.439059       0.016234    1.270561                0.016234           1.270561            1       True         25
14    LightGBM_BAG_L1/T2  -35.590010       0.187944   16.445438                0.187944          16.445438            1       True          2
15   XGBoost_3_BAG_L2/T1  -35.790098       8.195417  324.409641                0.013207           0.551623            2       True         37
16   XGBoost_5_BAG_L2/T1  -35.790098       8.197323  325.094311                0.015113           1.236293            2       True         42
17   XGBoost_4_BAG_L2/T1  -35.790098       8.197680  324.728979                0.015471           0.870961            2       True         40
18   XGBoost_5_BAG_L1/T1  -35.867266       0.014343    0.551025                0.014343           0.551025            1       True         26
19   XGBoost_4_BAG_L2/T2  -35.923375       8.196890  324.720822                0.014681           0.862804            2       True         41
20    LightGBM_BAG_L1/T5  -36.364213       0.159667   16.055488                0.159667          16.055488            1       True          5
21   XGBoost_3_BAG_L1/T4  -36.985329       0.170686   14.036922                0.170686          14.036922            1       True         14
22   XGBoost_4_BAG_L1/T1  -37.171439       0.013148    0.494805                0.013148           0.494805            1       True         22
23    LightGBM_BAG_L1/T3  -38.964909       0.201738   16.041790                0.201738          16.041790            1       True          3
24   XGBoost_5_BAG_L1/T2  -39.877508       0.014994    0.627352                0.014994           0.627352            1       True         27
25    LightGBM_BAG_L1/T1  -41.324754       0.138658   16.239201                0.138658          16.239201            1       True          1
26   XGBoost_3_BAG_L1/T1  -42.183714       0.149132   11.119115                0.149132          11.119115            1       True         11
27   XGBoost_3_BAG_L2/T2  -44.398416       8.194961  324.290455                0.012751           0.432437            2       True         38
28   XGBoost_4_BAG_L1/T2  -45.036694       0.012968    0.408097                0.012968           0.408097            1       True         23
29  XGBoost_3_BAG_L1/T11  -46.565790       0.010453    0.196033                0.010453           0.196033            1       True         21
30   XGBoost_5_BAG_L1/T3  -46.809150       0.019303    1.483124                0.019303           1.483124            1       True         28
31    LightGBM_BAG_L1/T6  -57.139249       0.148825   16.597632                0.148825          16.597632            1       True          6
32   XGBoost_4_BAG_L1/T3  -61.054729       0.015561    0.937454                0.015561           0.937454            1       True         24
33   XGBoost_3_BAG_L1/T7  -65.839719       0.010491    0.184709                0.010491           0.184709            1       True         17
34   XGBoost_3_BAG_L1/T2  -70.802462       0.140320   10.417979                0.140320          10.417979            1       True         12
35   XGBoost_3_BAG_L2/T3  -71.821258       8.196570  324.537438                0.014360           0.679420            2       True         39
36   XGBoost_3_BAG_L1/T5  -81.053063       0.122843    9.956612                0.122843           9.956612            1       True         15
37   XGBoost_3_BAG_L1/T3 -106.663599       0.221608   12.177471                0.221608          12.177471            1       True         13
38    LightGBM_BAG_L1/T4 -112.509639       0.134455   16.921681                0.134455          16.921681            1       True          4
39   XGBoost_3_BAG_L1/T6 -121.451404       0.137436    9.800652                0.137436           9.800652            1       True         16
40   XGBoost_3_BAG_L1/T8 -152.508632       0.010308    0.212872                0.010308           0.212872            1       True         18
41   XGBoost_3_BAG_L1/T9 -172.894902       0.012863    0.237206                0.012863           0.237206            1       True         19
42  XGBoost_3_BAG_L1/T10 -198.856425       0.011681    0.351543                0.011681           0.351543            1       True         20
Number of models trained: 43
Types of models trained:
{'StackerEnsembleModel_XGBoost', 'WeightedEnsembleModel', 'StackerEnsembleModel_LGB'}
Bagging used: True  (with 8 folds)
Multi-layer stack-ensembling used: True  (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', [])             : 2 | ['season', 'weather']
('float', [])                : 3 | ['atemp', 'temp', 'windspeed']
('int', [])                  : 4 | ['day', 'hour', 'humidity', 'month']
('int', ['bool'])            : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20221023_020336/SummaryOfModels.html
*** End of fit() summary ***
Out[32]:
{'model_types': {'LightGBM_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T2': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T3': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T4': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T5': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L1/T6': 'StackerEnsembleModel_LGB',
  'LightGBM_2_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_3_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_4_BAG_L1/T1': 'StackerEnsembleModel_LGB',
  'XGBoost_2_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T2': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T3': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T4': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T5': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T6': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T7': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T8': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T9': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T10': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L1/T11': 'StackerEnsembleModel_XGBoost',
  'XGBoost_4_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_4_BAG_L1/T2': 'StackerEnsembleModel_XGBoost',
  'XGBoost_4_BAG_L1/T3': 'StackerEnsembleModel_XGBoost',
  'XGBoost_4_BAG_L1/T4': 'StackerEnsembleModel_XGBoost',
  'XGBoost_5_BAG_L1/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_5_BAG_L1/T2': 'StackerEnsembleModel_XGBoost',
  'XGBoost_5_BAG_L1/T3': 'StackerEnsembleModel_XGBoost',
  'XGBoost_5_BAG_L1/T4': 'StackerEnsembleModel_XGBoost',
  'WeightedEnsemble_L2': 'WeightedEnsembleModel',
  'LightGBM_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_BAG_L2/T2': 'StackerEnsembleModel_LGB',
  'LightGBM_2_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_3_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'LightGBM_4_BAG_L2/T1': 'StackerEnsembleModel_LGB',
  'XGBoost_2_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L2/T2': 'StackerEnsembleModel_XGBoost',
  'XGBoost_3_BAG_L2/T3': 'StackerEnsembleModel_XGBoost',
  'XGBoost_4_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
  'XGBoost_4_BAG_L2/T2': 'StackerEnsembleModel_XGBoost',
  'XGBoost_5_BAG_L2/T1': 'StackerEnsembleModel_XGBoost',
  'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
 'model_performance': {'LightGBM_BAG_L1/T1': -41.324753683080274,
  'LightGBM_BAG_L1/T2': -35.59000961582356,
  'LightGBM_BAG_L1/T3': -38.96490884412152,
  'LightGBM_BAG_L1/T4': -112.50963897387132,
  'LightGBM_BAG_L1/T5': -36.36421309556182,
  'LightGBM_BAG_L1/T6': -57.13924895242837,
  'LightGBM_2_BAG_L1/T1': -33.82057213457704,
  'LightGBM_3_BAG_L1/T1': -33.824604698244464,
  'LightGBM_4_BAG_L1/T1': -34.112015577540326,
  'XGBoost_2_BAG_L1/T1': -34.52985570096839,
  'XGBoost_3_BAG_L1/T1': -42.18371382724795,
  'XGBoost_3_BAG_L1/T2': -70.80246208732468,
  'XGBoost_3_BAG_L1/T3': -106.66359894276964,
  'XGBoost_3_BAG_L1/T4': -36.985328592087086,
  'XGBoost_3_BAG_L1/T5': -81.05306303122312,
  'XGBoost_3_BAG_L1/T6': -121.45140392616798,
  'XGBoost_3_BAG_L1/T7': -65.83971903826267,
  'XGBoost_3_BAG_L1/T8': -152.50863166881652,
  'XGBoost_3_BAG_L1/T9': -172.89490195050985,
  'XGBoost_3_BAG_L1/T10': -198.85642528769088,
  'XGBoost_3_BAG_L1/T11': -46.56579010090366,
  'XGBoost_4_BAG_L1/T1': -37.17143912298946,
  'XGBoost_4_BAG_L1/T2': -45.036693586675085,
  'XGBoost_4_BAG_L1/T3': -61.05472874837559,
  'XGBoost_4_BAG_L1/T4': -35.43905891072602,
  'XGBoost_5_BAG_L1/T1': -35.86726614774864,
  'XGBoost_5_BAG_L1/T2': -39.87750845041545,
  'XGBoost_5_BAG_L1/T3': -46.80915033674734,
  'XGBoost_5_BAG_L1/T4': -35.43905891072602,
  'WeightedEnsemble_L2': -32.73992241379296,
  'LightGBM_BAG_L2/T1': -32.76213725311967,
  'LightGBM_BAG_L2/T2': -33.53084174739725,
  'LightGBM_2_BAG_L2/T1': -33.22368257729854,
  'LightGBM_3_BAG_L2/T1': -33.44633034585296,
  'LightGBM_4_BAG_L2/T1': -33.556591604696365,
  'XGBoost_2_BAG_L2/T1': -32.880989383934335,
  'XGBoost_3_BAG_L2/T1': -35.79009780560234,
  'XGBoost_3_BAG_L2/T2': -44.39841559511542,
  'XGBoost_3_BAG_L2/T3': -71.8212581695465,
  'XGBoost_4_BAG_L2/T1': -35.79009780560234,
  'XGBoost_4_BAG_L2/T2': -35.923374701289305,
  'XGBoost_5_BAG_L2/T1': -35.79009780560234,
  'WeightedEnsemble_L3': -32.616307536991854},
 'model_best': 'WeightedEnsemble_L3',
 'model_paths': {'LightGBM_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L1/T1/',
  'LightGBM_BAG_L1/T2': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L1/T2/',
  'LightGBM_BAG_L1/T3': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L1/T3/',
  'LightGBM_BAG_L1/T4': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L1/T4/',
  'LightGBM_BAG_L1/T5': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L1/T5/',
  'LightGBM_BAG_L1/T6': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L1/T6/',
  'LightGBM_2_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_2_BAG_L1/T1/',
  'LightGBM_3_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_3_BAG_L1/T1/',
  'LightGBM_4_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_4_BAG_L1/T1/',
  'XGBoost_2_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_2_BAG_L1/T1/',
  'XGBoost_3_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T1/',
  'XGBoost_3_BAG_L1/T2': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T2/',
  'XGBoost_3_BAG_L1/T3': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T3/',
  'XGBoost_3_BAG_L1/T4': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T4/',
  'XGBoost_3_BAG_L1/T5': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T5/',
  'XGBoost_3_BAG_L1/T6': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T6/',
  'XGBoost_3_BAG_L1/T7': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T7/',
  'XGBoost_3_BAG_L1/T8': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T8/',
  'XGBoost_3_BAG_L1/T9': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T9/',
  'XGBoost_3_BAG_L1/T10': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T10/',
  'XGBoost_3_BAG_L1/T11': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L1/T11/',
  'XGBoost_4_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_4_BAG_L1/T1/',
  'XGBoost_4_BAG_L1/T2': 'AutogluonModels/ag-20221023_020336/models/XGBoost_4_BAG_L1/T2/',
  'XGBoost_4_BAG_L1/T3': 'AutogluonModels/ag-20221023_020336/models/XGBoost_4_BAG_L1/T3/',
  'XGBoost_4_BAG_L1/T4': 'AutogluonModels/ag-20221023_020336/models/XGBoost_4_BAG_L1/T4/',
  'XGBoost_5_BAG_L1/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_5_BAG_L1/T1/',
  'XGBoost_5_BAG_L1/T2': 'AutogluonModels/ag-20221023_020336/models/XGBoost_5_BAG_L1/T2/',
  'XGBoost_5_BAG_L1/T3': 'AutogluonModels/ag-20221023_020336/models/XGBoost_5_BAG_L1/T3/',
  'XGBoost_5_BAG_L1/T4': 'AutogluonModels/ag-20221023_020336/models/XGBoost_5_BAG_L1/T4/',
  'WeightedEnsemble_L2': 'AutogluonModels/ag-20221023_020336/models/WeightedEnsemble_L2/',
  'LightGBM_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L2/T1/',
  'LightGBM_BAG_L2/T2': 'AutogluonModels/ag-20221023_020336/models/LightGBM_BAG_L2/T2/',
  'LightGBM_2_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_2_BAG_L2/T1/',
  'LightGBM_3_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_3_BAG_L2/T1/',
  'LightGBM_4_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/LightGBM_4_BAG_L2/T1/',
  'XGBoost_2_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_2_BAG_L2/T1/',
  'XGBoost_3_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L2/T1/',
  'XGBoost_3_BAG_L2/T2': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L2/T2/',
  'XGBoost_3_BAG_L2/T3': 'AutogluonModels/ag-20221023_020336/models/XGBoost_3_BAG_L2/T3/',
  'XGBoost_4_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_4_BAG_L2/T1/',
  'XGBoost_4_BAG_L2/T2': 'AutogluonModels/ag-20221023_020336/models/XGBoost_4_BAG_L2/T2/',
  'XGBoost_5_BAG_L2/T1': 'AutogluonModels/ag-20221023_020336/models/XGBoost_5_BAG_L2/T1/',
  'WeightedEnsemble_L3': 'AutogluonModels/ag-20221023_020336/models/WeightedEnsemble_L3/'},
 'model_fit_times': {'LightGBM_BAG_L1/T1': 16.239201068878174,
  'LightGBM_BAG_L1/T2': 16.445437908172607,
  'LightGBM_BAG_L1/T3': 16.041790008544922,
  'LightGBM_BAG_L1/T4': 16.92168140411377,
  'LightGBM_BAG_L1/T5': 16.055487871170044,
  'LightGBM_BAG_L1/T6': 16.59763240814209,
  'LightGBM_2_BAG_L1/T1': 38.13216161727905,
  'LightGBM_3_BAG_L1/T1': 38.5370192527771,
  'LightGBM_4_BAG_L1/T1': 40.472843170166016,
  'XGBoost_2_BAG_L1/T1': 40.906012773513794,
  'XGBoost_3_BAG_L1/T1': 11.119115114212036,
  'XGBoost_3_BAG_L1/T2': 10.41797924041748,
  'XGBoost_3_BAG_L1/T3': 12.177471160888672,
  'XGBoost_3_BAG_L1/T4': 14.036921501159668,
  'XGBoost_3_BAG_L1/T5': 9.956611633300781,
  'XGBoost_3_BAG_L1/T6': 9.800651550292969,
  'XGBoost_3_BAG_L1/T7': 0.1847093105316162,
  'XGBoost_3_BAG_L1/T8': 0.21287155151367188,
  'XGBoost_3_BAG_L1/T9': 0.23720574378967285,
  'XGBoost_3_BAG_L1/T10': 0.35154271125793457,
  'XGBoost_3_BAG_L1/T11': 0.19603323936462402,
  'XGBoost_4_BAG_L1/T1': 0.49480533599853516,
  'XGBoost_4_BAG_L1/T2': 0.4080970287322998,
  'XGBoost_4_BAG_L1/T3': 0.9374542236328125,
  'XGBoost_4_BAG_L1/T4': 1.2705605030059814,
  'XGBoost_5_BAG_L1/T1': 0.5510251522064209,
  'XGBoost_5_BAG_L1/T2': 0.6273515224456787,
  'XGBoost_5_BAG_L1/T3': 1.4831244945526123,
  'XGBoost_5_BAG_L1/T4': 1.3179364204406738,
  'WeightedEnsemble_L2': 0.767611026763916,
  'LightGBM_BAG_L2/T1': 17.128103494644165,
  'LightGBM_BAG_L2/T2': 19.074299335479736,
  'LightGBM_2_BAG_L2/T1': 30.758761405944824,
  'LightGBM_3_BAG_L2/T1': 35.067671060562134,
  'LightGBM_4_BAG_L2/T1': 43.9777295589447,
  'XGBoost_2_BAG_L2/T1': 28.958407402038574,
  'XGBoost_3_BAG_L2/T1': 0.5516231060028076,
  'XGBoost_3_BAG_L2/T2': 0.4324371814727783,
  'XGBoost_3_BAG_L2/T3': 0.6794204711914062,
  'XGBoost_4_BAG_L2/T1': 0.8709614276885986,
  'XGBoost_4_BAG_L2/T2': 0.8628041744232178,
  'XGBoost_5_BAG_L2/T1': 1.236293077468872,
  'WeightedEnsemble_L3': 0.43173789978027344},
 'model_pred_times': {'LightGBM_BAG_L1/T1': 0.138657808303833,
  'LightGBM_BAG_L1/T2': 0.1879441738128662,
  'LightGBM_BAG_L1/T3': 0.2017383575439453,
  'LightGBM_BAG_L1/T4': 0.13445544242858887,
  'LightGBM_BAG_L1/T5': 0.1596670150756836,
  'LightGBM_BAG_L1/T6': 0.14882540702819824,
  'LightGBM_2_BAG_L1/T1': 2.035104990005493,
  'LightGBM_3_BAG_L1/T1': 1.926668405532837,
  'LightGBM_4_BAG_L1/T1': 1.739962100982666,
  'XGBoost_2_BAG_L1/T1': 0.5671615600585938,
  'XGBoost_3_BAG_L1/T1': 0.14913177490234375,
  'XGBoost_3_BAG_L1/T2': 0.14031982421875,
  'XGBoost_3_BAG_L1/T3': 0.22160816192626953,
  'XGBoost_3_BAG_L1/T4': 0.1706860065460205,
  'XGBoost_3_BAG_L1/T5': 0.12284255027770996,
  'XGBoost_3_BAG_L1/T6': 0.1374359130859375,
  'XGBoost_3_BAG_L1/T7': 0.010491132736206055,
  'XGBoost_3_BAG_L1/T8': 0.010307788848876953,
  'XGBoost_3_BAG_L1/T9': 0.0128631591796875,
  'XGBoost_3_BAG_L1/T10': 0.011681079864501953,
  'XGBoost_3_BAG_L1/T11': 0.010452508926391602,
  'XGBoost_4_BAG_L1/T1': 0.013148069381713867,
  'XGBoost_4_BAG_L1/T2': 0.012968063354492188,
  'XGBoost_4_BAG_L1/T3': 0.015561103820800781,
  'XGBoost_4_BAG_L1/T4': 0.01623392105102539,
  'XGBoost_5_BAG_L1/T1': 0.014343023300170898,
  'XGBoost_5_BAG_L1/T2': 0.014993667602539062,
  'XGBoost_5_BAG_L1/T3': 0.019303083419799805,
  'XGBoost_5_BAG_L1/T4': 0.016123056411743164,
  'WeightedEnsemble_L2': 0.0008077621459960938,
  'LightGBM_BAG_L2/T1': 0.1342144012451172,
  'LightGBM_BAG_L2/T2': 0.11953520774841309,
  'LightGBM_2_BAG_L2/T1': 0.1743755340576172,
  'LightGBM_3_BAG_L2/T1': 0.1576690673828125,
  'LightGBM_4_BAG_L2/T1': 0.22277402877807617,
  'XGBoost_2_BAG_L2/T1': 0.16775941848754883,
  'XGBoost_3_BAG_L2/T1': 0.013207435607910156,
  'XGBoost_3_BAG_L2/T2': 0.012751340866088867,
  'XGBoost_3_BAG_L2/T3': 0.014360427856445312,
  'XGBoost_4_BAG_L2/T1': 0.015470504760742188,
  'XGBoost_4_BAG_L2/T2': 0.014680624008178711,
  'XGBoost_5_BAG_L2/T1': 0.015113115310668945,
  'WeightedEnsemble_L3': 0.0010557174682617188},
 'num_bag_folds': 8,
 'max_stack_level': 3,
 'model_hyperparams': {'LightGBM_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T5': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L1/T6': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_2_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_3_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_4_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_2_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T5': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T6': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T7': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T8': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T9': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T10': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L1/T11': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_4_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_4_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_4_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_4_BAG_L1/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_5_BAG_L1/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_5_BAG_L1/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_5_BAG_L1/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_5_BAG_L1/T4': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L2': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_2_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_3_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'LightGBM_4_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_2_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_3_BAG_L2/T3': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_4_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_4_BAG_L2/T2': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'XGBoost_5_BAG_L2/T1': {'use_orig_features': True,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True},
  'WeightedEnsemble_L3': {'use_orig_features': False,
   'max_base_models': 25,
   'max_base_models_per_type': 5,
   'save_bag_folds': True}},
 'leaderboard':                    model   score_val  pred_time_val    fit_time  \
 0    WeightedEnsemble_L3  -32.616308       8.485239  370.376266   
 1    WeightedEnsemble_L2  -32.739922       6.628335  189.298007   
 2     LightGBM_BAG_L2/T1  -32.762137       8.316424  340.986121   
 3    XGBoost_2_BAG_L2/T1  -32.880989       8.349969  352.816425   
 4   LightGBM_2_BAG_L2/T1  -33.223683       8.356585  354.616779   
 5   LightGBM_3_BAG_L2/T1  -33.446330       8.339879  358.925689   
 6     LightGBM_BAG_L2/T2  -33.530842       8.301745  342.932317   
 7   LightGBM_4_BAG_L2/T1  -33.556592       8.404984  367.835747   
 8   LightGBM_2_BAG_L1/T1  -33.820572       2.035105   38.132162   
 9   LightGBM_3_BAG_L1/T1  -33.824605       1.926668   38.537019   
 10  LightGBM_4_BAG_L1/T1  -34.112016       1.739962   40.472843   
 11   XGBoost_2_BAG_L1/T1  -34.529856       0.567162   40.906013   
 12   XGBoost_5_BAG_L1/T4  -35.439059       0.016123    1.317936   
 13   XGBoost_4_BAG_L1/T4  -35.439059       0.016234    1.270561   
 14    LightGBM_BAG_L1/T2  -35.590010       0.187944   16.445438   
 15   XGBoost_3_BAG_L2/T1  -35.790098       8.195417  324.409641   
 16   XGBoost_5_BAG_L2/T1  -35.790098       8.197323  325.094311   
 17   XGBoost_4_BAG_L2/T1  -35.790098       8.197680  324.728979   
 18   XGBoost_5_BAG_L1/T1  -35.867266       0.014343    0.551025   
 19   XGBoost_4_BAG_L2/T2  -35.923375       8.196890  324.720822   
 20    LightGBM_BAG_L1/T5  -36.364213       0.159667   16.055488   
 21   XGBoost_3_BAG_L1/T4  -36.985329       0.170686   14.036922   
 22   XGBoost_4_BAG_L1/T1  -37.171439       0.013148    0.494805   
 23    LightGBM_BAG_L1/T3  -38.964909       0.201738   16.041790   
 24   XGBoost_5_BAG_L1/T2  -39.877508       0.014994    0.627352   
 25    LightGBM_BAG_L1/T1  -41.324754       0.138658   16.239201   
 26   XGBoost_3_BAG_L1/T1  -42.183714       0.149132   11.119115   
 27   XGBoost_3_BAG_L2/T2  -44.398416       8.194961  324.290455   
 28   XGBoost_4_BAG_L1/T2  -45.036694       0.012968    0.408097   
 29  XGBoost_3_BAG_L1/T11  -46.565790       0.010453    0.196033   
 30   XGBoost_5_BAG_L1/T3  -46.809150       0.019303    1.483124   
 31    LightGBM_BAG_L1/T6  -57.139249       0.148825   16.597632   
 32   XGBoost_4_BAG_L1/T3  -61.054729       0.015561    0.937454   
 33   XGBoost_3_BAG_L1/T7  -65.839719       0.010491    0.184709   
 34   XGBoost_3_BAG_L1/T2  -70.802462       0.140320   10.417979   
 35   XGBoost_3_BAG_L2/T3  -71.821258       8.196570  324.537438   
 36   XGBoost_3_BAG_L1/T5  -81.053063       0.122843    9.956612   
 37   XGBoost_3_BAG_L1/T3 -106.663599       0.221608   12.177471   
 38    LightGBM_BAG_L1/T4 -112.509639       0.134455   16.921681   
 39   XGBoost_3_BAG_L1/T6 -121.451404       0.137436    9.800652   
 40   XGBoost_3_BAG_L1/T8 -152.508632       0.010308    0.212872   
 41   XGBoost_3_BAG_L1/T9 -172.894902       0.012863    0.237206   
 42  XGBoost_3_BAG_L1/T10 -198.856425       0.011681    0.351543   
 
     pred_time_val_marginal  fit_time_marginal  stack_level  can_infer  \
 0                 0.001056           0.431738            3       True   
 1                 0.000808           0.767611            2       True   
 2                 0.134214          17.128103            2       True   
 3                 0.167759          28.958407            2       True   
 4                 0.174376          30.758761            2       True   
 5                 0.157669          35.067671            2       True   
 6                 0.119535          19.074299            2       True   
 7                 0.222774          43.977730            2       True   
 8                 2.035105          38.132162            1       True   
 9                 1.926668          38.537019            1       True   
 10                1.739962          40.472843            1       True   
 11                0.567162          40.906013            1       True   
 12                0.016123           1.317936            1       True   
 13                0.016234           1.270561            1       True   
 14                0.187944          16.445438            1       True   
 15                0.013207           0.551623            2       True   
 16                0.015113           1.236293            2       True   
 17                0.015471           0.870961            2       True   
 18                0.014343           0.551025            1       True   
 19                0.014681           0.862804            2       True   
 20                0.159667          16.055488            1       True   
 21                0.170686          14.036922            1       True   
 22                0.013148           0.494805            1       True   
 23                0.201738          16.041790            1       True   
 24                0.014994           0.627352            1       True   
 25                0.138658          16.239201            1       True   
 26                0.149132          11.119115            1       True   
 27                0.012751           0.432437            2       True   
 28                0.012968           0.408097            1       True   
 29                0.010453           0.196033            1       True   
 30                0.019303           1.483124            1       True   
 31                0.148825          16.597632            1       True   
 32                0.015561           0.937454            1       True   
 33                0.010491           0.184709            1       True   
 34                0.140320          10.417979            1       True   
 35                0.014360           0.679420            2       True   
 36                0.122843           9.956612            1       True   
 37                0.221608          12.177471            1       True   
 38                0.134455          16.921681            1       True   
 39                0.137436           9.800652            1       True   
 40                0.010308           0.212872            1       True   
 41                0.012863           0.237206            1       True   
 42                0.011681           0.351543            1       True   
 
     fit_order  
 0          43  
 1          30  
 2          31  
 3          36  
 4          33  
 5          34  
 6          32  
 7          35  
 8           7  
 9           8  
 10          9  
 11         10  
 12         29  
 13         25  
 14          2  
 15         37  
 16         42  
 17         40  
 18         26  
 19         41  
 20          5  
 21         14  
 22         22  
 23          3  
 24         27  
 25          1  
 26         11  
 27         38  
 28         23  
 29         21  
 30         28  
 31          6  
 32         24  
 33         17  
 34         12  
 35         39  
 36         15  
 37         13  
 38          4  
 39         16  
 40         18  
 41         19  
 42         20  }
In [33]:
predictor_new_hpo.leaderboard(silent=True).plot(kind="bar", x="model", y="score_val")
Out[33]:
<AxesSubplot:xlabel='model'>
In [34]:
# Remember to set all negative values to zero
test_new["count"] = 0
performance_new_hpo = predictor_new_hpo.evaluate(test_new)
print("The performance indicators are : \n", performance_new_hpo)
/usr/local/lib/python3.7/site-packages/scipy/stats/stats.py:4023: PearsonRConstantInputWarning: An input array is constant; the correlation coefficient is not defined.
  warnings.warn(PearsonRConstantInputWarning())
Evaluation: root_mean_squared_error on test data: -257.90773442592734
	Note: Scores are always higher_is_better. This metric score can be multiplied by -1 to get the metric value.
Evaluations on test data:
{
    "root_mean_squared_error": -257.90773442592734,
    "mean_squared_error": -66516.39947671468,
    "mean_absolute_error": -190.9576504444573,
    "r2": 0.0,
    "pearsonr": NaN,
    "median_absolute_error": -147.15931701660156
}
The performance indicators are : 
 {'root_mean_squared_error': -257.90773442592734, 'mean_squared_error': -66516.39947671468, 'mean_absolute_error': -190.9576504444573, 'r2': 0.0, 'pearsonr': nan, 'median_absolute_error': -147.15931701660156}
In [35]:
# Remember to set all negative values to zero
predictions_new_features = predictor_new_hpo.predict(test_new)
In [36]:
predictions_new_features
Out[36]:
0        10.835062
1         7.043610
2         6.781248
3         6.762746
4         6.762746
           ...    
6488    359.697021
6489    215.506653
6490    159.988556
6491    101.330910
6492     56.446304
Name: count, Length: 6493, dtype: float32
In [37]:
predictions_new_features[predictions_new_features<0]
Out[37]:
Series([], Name: count, dtype: float32)
In [38]:
# Same submitting predictions
submission_new_hpo = submission
submission_new_hpo["count"] = predictions_new_features
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
In [39]:
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameter tuning of GBM and XGBoost"
100%|█████████████████████████████████████████| 188k/188k [00:00<00:00, 381kB/s]
Successfully submitted to Bike Sharing Demand
In [40]:
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
fileName                     date                 description                                                 status    publicScore  privateScore  
---------------------------  -------------------  ----------------------------------------------------------  --------  -----------  ------------  
submission_new_hpo.csv       2022-10-23 02:55:42  new features with hyperparameter tuning of GBM and XGBoost  complete  0.47866      0.47866       
submission_new_hpo.csv       2022-10-23 02:24:32  new features with hyperparameter tuning of GBM and XGBoost  complete  0.47866      0.47866       
submission_new_hpo.csv       2022-10-19 22:30:48  new features with hyperparameters                           complete  0.48898      0.48898       
submission_new_features.csv  2022-10-19 20:23:34  new features + set weather, holiday, season, workingday     complete  1.80119      1.80119       

Testing a new algorithm to compare performance with autogluon¶

In [28]:
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(n_estimators = 100, random_state = 2020)
#Et on lance le training sur notre dataset de train
rf.fit(train_new[train_new.columns.difference(['count','datetime'])], train_new.count)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-83fc54d835ff> in <module>
      3 rf = RandomForestRegressor(n_estimators = 100, random_state = 2020)
      4 #Et on lance le training sur notre dataset de train
----> 5 rf.fit(train_new[train_new.columns.difference(['count','datetime'])], train_new.count)

/usr/local/lib/python3.7/site-packages/sklearn/ensemble/_forest.py in fit(self, X, y, sample_weight)
    326             raise ValueError("sparse multilabel-indicator for y is not supported.")
    327         X, y = self._validate_data(
--> 328             X, y, multi_output=True, accept_sparse="csc", dtype=DTYPE
    329         )
    330         if sample_weight is not None:

/usr/local/lib/python3.7/site-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
    574                 y = check_array(y, **check_y_params)
    575             else:
--> 576                 X, y = check_X_y(X, y, **check_params)
    577             out = X, y
    578 

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    969     )
    970 
--> 971     y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric)
    972 
    973     check_consistent_length(X, y)

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in _check_y(y, multi_output, y_numeric)
    980     if multi_output:
    981         y = check_array(
--> 982             y, accept_sparse="csr", force_all_finite=True, ensure_2d=False, dtype=None
    983         )
    984     else:

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    793 
    794     if ensure_min_samples > 0:
--> 795         n_samples = _num_samples(array)
    796         if n_samples < ensure_min_samples:
    797             raise ValueError(

/usr/local/lib/python3.7/site-packages/sklearn/utils/validation.py in _num_samples(x)
    267         if len(x.shape) == 0:
    268             raise TypeError(
--> 269                 "Singleton array %r cannot be considered a valid collection." % x
    270             )
    271         # Check that shape is returning an integer or default to len

TypeError: Singleton array array(<bound method DataFrame.count of                  datetime  season  holiday  workingday  weather   temp  \
0     2011-01-01 00:00:00       1        0           0        1   9.84   
1     2011-01-01 01:00:00       1        0           0        1   9.02   
2     2011-01-01 02:00:00       1        0           0        1   9.02   
3     2011-01-01 03:00:00       1        0           0        1   9.84   
4     2011-01-01 04:00:00       1        0           0        1   9.84   
...                   ...     ...      ...         ...      ...    ...   
10881 2012-12-19 19:00:00       4        0           1        1  15.58   
10882 2012-12-19 20:00:00       4        0           1        1  14.76   
10883 2012-12-19 21:00:00       4        0           1        1  13.94   
10884 2012-12-19 22:00:00       4        0           1        1  13.94   
10885 2012-12-19 23:00:00       4        0           1        1  13.12   

        atemp  humidity  windspeed  casual  registered  count  year  month  \
0      14.395        81     0.0000       3          13     16  2011      1   
1      13.635        80     0.0000       8          32     40  2011      1   
2      13.635        80     0.0000       5          27     32  2011      1   
3      14.395        75     0.0000       3          10     13  2011      1   
4      14.395        75     0.0000       0           1      1  2011      1   
...       ...       ...        ...     ...         ...    ...   ...    ...   
10881  19.695        50    26.0027       7         329    336  2012     12   
10882  17.425        57    15.0013      10         231    241  2012     12   
10883  15.910        61    15.0013       4         164    168  2012     12   
10884  17.425        61     6.0032      12         117    129  2012     12   
10885  16.665        66     8.9981       4          84     88  2012     12   

       day  hour  
0        1     0  
1        1     1  
2        1     2  
3        1     3  
4        1     4  
...    ...   ...  
10881   19    19  
10882   19    20  
10883   19    21  
10884   19    22  
10885   19    23  

[10886 rows x 16 columns]>, dtype=object) cannot be considered a valid collection.

New Score of 0.47866¶

Step 7: Write a Report¶

Refer to the markdown file for the full report¶

Creating plots and table for report¶

In [30]:
# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
    {
        "model": ["initial", "add_features", "hpo1", "hpo2"],
        "score": [ -52.7, -30.18, -32.3, -32.6]
    }
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('img/model_train_score_project.png')
In [31]:
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
    {
        "test_eval": ["initial", "add_features", "hpo1", "hpo2"],
        "score": [1.809, 1.801, 0.49, 0.48]
    }
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('img/model_test_score_project.png')

Hyperparameter table¶

In [42]:
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
    "model": ["initial", "add_features", "hpo"],
    "time_limit": [600, 600, 600],
    "presets": ["best_quality", "best_quality", "best_quality"],
    "hyperparameters": ['default', 'default', "{'GBM': gbm_config,'XGB': xgb_config}"],
    "hyperparameter_tune_kwargs":["-", "auto", "{'searcher':'auto'}"],
    "score": [1.809, 0.49, 0.48]
})
Out[42]:
model time_limit presets hyperparameters hyperparameter_tune_kwargs score
0 initial 600 best_quality default - 1.809
1 add_features 600 best_quality default auto 0.490
2 hpo 600 best_quality {'GBM': gbm_config,'XGB': xgb_config} {'searcher':'auto'} 0.480